Ada Compiler Evaluation System
Reader's Guide
for
Version 2.1
FINAL
Contract Number F33600-92-D-0125
CDRL A0014
Prepared for:
High Order Language Control Facility
Business Software Support Branch
88 CG/SCTL
Wright-Patterson AFB OH 45433-5707
Prepared by:
CTA INCORPORATED
5100 Springfield Pike, Suite 100
Dayton, OH 45431
Availability of the Ada Compiler Evaluation System
The ACES software and documentation are available by anonymous FTP from the host "sw-eng.falls-church.va.us" in the directory "public/AdaIC/testing/aces/v2.0" and from other Ada-related hosts. Document files are included in PostScript format and as ASCII text. The ACES files are also accessible via the World-Wide Web. The home page URL is "http://sw-eng.falls-church.va.us/AdaIC/testing/aces/".
For further information about the ACES, contact the High Order Language Control Facility. As of 1 March 1995, the appropriate contact is:
Mr. Brian Andrews
88 CG/SCTL
3810 Communications, Suite 1
Wright-Patterson AFB, OH 45433-5707
(513) 255-4472
andrewbp@email.wpafb.mil
This Reader's Guide for the Ada Compiler Evaluation System describes how end users can interpret the results of executing the ACES performance tests, the statistical significance of the numbers produced, the organization of the test suite, how to find particular language features and/or specific optimizations, and how to submit error reports and change requests.
LIST OF FIGURES
The following documents are referenced in this guide.
ANSI/MIL-STD-1815A Reference Manual for the Ada Programming Language (LRM)
User's Guide for Ada Compiler Evaluation System (ACES), Version 2.0
High Order Language Control Facility
Business Software Support Branch
88 CG/SCTL
Wright-Patterson AFB OH
Version Description Document (VDD) for
Ada Compiler Evaluation System (ACES), Version 2.0
High Order Language Control Facility
Business Software Support Branch
88 CG/SCTL
Wright-Patterson AFB OH
Primer for Ada Compiler Evaluation System (ACES),
Version 2.0
High Order Language Control Facility
Business Software Support Branch
88 CG/SCTL
Wright-Patterson AFB OH
ISO/IEC 8652 (1995) Programming Language Ada, Language and Standard Libraries (RM 95)
"A Performance Evaluation of the Intel iAPX 432" by Hansen, Linton, Mayo, Murphy and Patterson in Computer Architecture News, Volume 10, Number 4, June 1982.
Ada Evaluation System AES/1
User Introduction to the Ada Evaluation System
Release 2, Version 1, Issue 1,
Crown Copyright, 1990.
Ada Evaluation System AES/2
Reference Manual for the Ada Evaluation Compiler Tests
Release 2, Version 1, Issue 1,
Crown Copyright, 1990.
Ada Evaluation System AES/3
System User Manual Parts 0 and 1
Introduction and General Information
Release 1, Version 1, Issue 2,
Crown Copyright, 1990.
ALGOL60 Compilation and Assessment by B. Wichmann, Academic Press, 1973.
"An Empirical Study of FORTRAN Programs" by D. Knuth in Software: Practice and Experience, Volume 1, Number 2, 1971.
Analysis of Messy Data by George A. Milliken and Dallas E. Johnson Van Nostrand Reinhold Company, New York, 1984.
Applications, Basics and Computation of Exploratory Data Analysis by Velleman and Hoaglin, Durbery Press, 1981.
"Are SPECmarks as Useless as Other Benchmarks?" by
O. Selin, UnixWorld, August 1991.
Biostatistics: A Foundation for Analysis in the Health Sciences by Wayne W. Daniel, John Wiley and Sons, 1978.
Characterizing Computer Performance with a Single Number
by James E. Smith, CACM, Oct. 1988, Vol 31, No. 10.
pp1202-06.
Compilers: Principles, Techniques, and Tools by A. Aho,
R. Sethi, and J. Ullman, Addison-Wesley, 1986.
Computer Architecture A Quantitative Approach by David A. Patterson and John L. Hennessy, Morgan Kaufmann Publishers, Inc. San Mateo, CA. 1990.
"How Not to Lie with Statistics: The Correct Way to Summarize Benchmark Results" by P. Fleming and J. Wallace in CACM, Volume 29, Number 3.
Introduction to Theory of Statistics by A. Mood and F. Graybill, McGraw-Hill, 1963.
Low Level Self-Measurement in Computers by R. Koster, School of Engineering and Applied Science, University of California, Los Angeles, Report Number 69-57, 1969.
Optimizing Supercompilers by Supercomputers M. Wolfe, University of Illinois at Urbana-Champaign, 1982.
Performance and Evaluation of LISP Systems by R. Gabriel, MIT Press, Cambridge, Mass., 1986.
"Performance Variation Across Benchmark Suites" by
C. Ponder, Computer Architecture News, Vol 19 No 4, June 1991.
"Re-evaluation of RISC I" by J. L. Heath in Computer Architecture News, Volume 12, Number 1, March 1984.
"Task Management in Ada - A Critical Evaluation for Real-Time Multiprocessors," by E. Roberts, A. Evans, C. Morgan, and E. Clarke, Software: Practice and Experience, Volume 11, Number 10, October 1981.
"The Keynote Address for the 15th Annual International Symposium on Computer Architecture" by D. Kuck, May 30 - June 3, 1988. Computer Architecture News,
Vol 17 No 1, March 1989.
Understanding Robust and Exploratory Data Analysis by
D. Hoaglin, R. Mosteller, and J. Tukey, John Wiley & Sons,
1983.
This section identifies the Ada Compiler Evaluation System (ACES) product, states its purpose, and summarizes the purpose and contents of this Reader's Guide.
This document is the Reader's Guide for the ACES Software Product. This guide explains the rationale behind the ACES product, objectives of particular tests and groups of tests, and objectives of the assessors and analysis tools. The User's Guide explains how to adapt the product to a particular Ada environment and how to run the tests, the assessors, and the analysis tools. The files and tests are named and described individually in the Version Description Document (VDD). A simplified set of instructions for using the ACES, suitable for use when no unexpected problems arise, is given in the Primer.
This document defines what the ACES test suite and associated tools are to accomplish, their intended users, and why the various parts exist as they do. Additionally, the background and approach of the ACES are defined. The rationale behind the design and development of the various test groups is described, as well as the background of Ada usage influencing the development, and a general perspective of approaches. This document also describes the benefits of different approaches to results analysis, and capabilities of the analysis tools to assist in those approaches.
The ACES is a tool to assist in the evaluation of an Ada compilation system. The basis of the ACES is a series of performance tests and assessors that are designed and built in Ada, to be adapted and executed on an Ada system. The performance tests are designed to measure execution timings and memory storage requirements. Additional tests are included to acquire compilation speed, system capacities, symbolic debugger capabilities, Ada program library functions, and diagnostic error messages. Tools to help with adaptation of the ACES product, as well as to assist in the analysis of the resulting data, are included.
The ACES makes it possible to:
* Compare the performance of several implementations. The Operational Software permits the determination of which is the better performing system for given expected Ada usage.
* Isolate the strong and weak points of a specific system, relative to others which have been tested. Weak points, once isolated, can be enhanced by implementors or avoided by programmers.
* Determine what significant changes were made between releases of a compilation system.
* Predict performance of alternate coding styles. For example, the performance of rendezvous may be such that designers will avoid tasking in their applications. The ACES will provide information to permit users to make such decisions in an informed manner.
* Determine whether the functional capabilities of a symbolic debugger are sufficient to accomplish a set of predefined scenarios. There is a wide variance among compilation systems in types of user interfaces and functional capabilities of symbolic debuggers.
* Determine whether the functional capabilities of a program library management system are sufficient to accomplish a set of predefined scenarios. There is a wide variance among compilation systems in types of user interfaces and functional capabilities for a program library management system.
* Evaluate the clarity and accuracy of a system's diagnostic messages. There are no standards for the format or content of diagnostic messages. The interpretation of the system response to these compilation units will require manual inspection and evaluation.
* Evaluate the system capacity limitations involving the compiler, linker and run-time system.
Different types of users might be interested in the ACES.
* Compiler implementors will want to know the strong and weak points of their system(s). They will also want to be able to assess improvements in their system.
* Compiler selectors are interested in comparing performance across systems to choose the best system for their project.
* Compiler users will want to be able to predict the performance of design approaches. They will also want to be able to isolate the strong and weak points of the compilation system they are using and to monitor performance differences between releases of a compilation system.
The ACES user is expected to already know how to use the Ada compilation system being tested. While this is not always a realistic assumption, it is not feasible to explain how every Ada compilation system which an ACES user might encounter will operate. For details on how to use any particular Ada compilation system, the ACES user is referred to the documentation for the compilation system. Similarly, for details of how a user can perform operations in the host operating system, the user must consult system documentation. In particular, an ACES user is expected to know (or be able to find out from sources other than ACES documentation) how to: use the text editor; construct a command file (script); compile, link and execute an Ada program; delete Ada compilation units from a program library; and in general, how to use the tools provided by the Ada system and the host operating system. Before starting with the ACES proper, users need to perform any Ada program library creation required by their compilation system.
Readers with prior backgrounds in statistics may have an easier time interpreting the ACES output as it deals with confidence intervals, standard deviation, modeling, residuals, robust estimators, significant differences, etc.. However, this document explains the relevant concepts as they arise and contains citations to other works providing more background for interested readers.
The Ada programs provided are generally portable, consistent with the requirement to test all major features of the Ada language. In some cases, a feature is inherently implementation dependent and will have to be adapted to operate on each new system.
For a complete list of system-dependent tests, refer to the VDD Appendix D, "System Dependent Test Problems". The ACES does not restrict itself to a subset of the language to try to increase its portability. For example, there are some test problems which use floating point types declared with 9 digits of precision, although there are several Ada implementations where this size exceeds SYSTEM.MAX_DIGITS. Also, there are some test programs which will not terminate when executed on systems which do not support pre-emptive priority scheduling.
Some test problems may fail to execute on a validated implementation when they violate system capabilities, such as exceeding a capacity limit or not supporting PRAGMA PACK. Each project must decide for itself how serious it considers the failure of any test problem.
The ACES contains a large number of test problems. Most individual problems are fairly small. Many address one language feature or present an example which is particularly well suited to the application of a specific optimization technique.
The focus of the Comparative Analysis (CA) tool is on comparing performance data (execution time, compilation time, link time, compilation plus link time, and code expansion size) from different Ada compilation systems. The measurements reflect the influences of both the compiler (software) and the processor (hardware), and no attempt is made to isolate the relative contribution of the compiler or processor to the overall compilation system performance. It is quite possible that a highly optimizing compiler for a slow target processor will generate code which executes slower than that of a non-optimizing compiler generating code which runs on a fast target processor; in such a case the ACES will report that the second compilation system is faster. This does not imply that the second COMPILER is "inherently better" than the first; running the first compiler on a faster target processor could reverse the conclusion about which compilation system is faster. The CA tool computes "average" performance factors of different compilation systems over all the test problems. CA identifies test problems where any individual system is much slower (or faster) than expected, relative to the average performance of that system over all problems and the average performance of that problem over all compilation systems. ACES users can review the CA report to isolate the weak and strong points of an implementation, by looking for common threads among the test problems which report exceptional performance data. ACES users can also examine the results on a specific system in more detail by running the Single System Analysis (SSA) program.
To be successful, any suite of tests must be satisfactory at each of three independent levels.
* The organization of the test suite and supporting tools.
* The relationships between sets of individual test problems in a test suite.
* The properties of individual test problems.
These are discussed below.
* The organization of the test suite and supporting tools.
Topics of interest at this level are the reporting facilities provided, and identification of the test problems in the suite which address areas of interest.
The ACES Harness program will allow the user to easily identify tests to run and build scripts to run those tests. If the user has a set of tests that he wants to run for a number of systems, he can identify the group of tests in the performance issue file and easily select the entire set of tests at once.
The ACES Comparative Analysis program will compare performance data between systems. It will identify the test problems which show statistically unusual results.
The ACES Single System Analysis program will look at related test results and help isolate the strong and weak points of a system's performance.
An extensive system of indexes and cross reference lists is provided in the VDD appendices that accompany each release. Using these, the test problems which share a particular characteristic can easily be found and their results examined. The VDD appendices are listed in Table 1-2 "Appendices Description".
* The relationships between sets of individual test problems in a test suite.
The primary concern at this level is the breadth and depth of coverage that the test problems in the suite provide. The ACES test suite is designed to provide extensive coverage of language features and common constructions. Version 2.1 provides significant, but not extensive coverage of language features new to Ada 95.
The functional requirements on the ACES for breadth and depth of the coverage are discussed in Section 3 "ORGANIZATION OF THE TEST SUITE", and summarized here.
+ The test suite should contain problems which:
- Address all major syntactic language features.
- Demonstrate the presence (or absence) of particular compiler optimization techniques by comparing results among related test problems. For example, there are several sets of problems where one version is a "hand-optimized" variation of another. If the system executed both versions in the same time, then it was able to recognize the "more general" example. Test problem op_al_alge_simp_04 is the statement "II := LL * 0;" and op_mi_mach_idiom_03 is the statement "II := 0;". If both statements take the same time to execute, it is fair to assume that the compiler has recognized the algebraic simplification possible in op_al_alge_simp_04 and exploited it.
- Represent problems from Ada practice. It is important to include examples of how Ada is actually being used. Practical problems are not designed to be optimized (or to be unoptimizable), they are simply built to get the work done. Some examples are fairly large programs that can give an estimate for the effects of program locality on cache memory usage which is not representative of very small programs. An instance of this type of problem is the KALMAN program in ap_kfm01, which performs a digital space state filter operation.
- Represent classic benchmark problems used in the comparison of other languages. These include programs such as Ackermann's function in programs cl_acm01 - cl_acm02, Whetstone in programs cl_whm01 - cl_whm04, Dhrystone in programs cl_dhm01 - cl_dhm03, and sort programs in cl_som01 - cl_som03. Results from these programs may be available for other languages. (The classical benchmark tests made up the Classical (Cl) group of ACES tests. See Section 3.2.4.1.)
+ Test problems should not generally duplicate other test problems. For consistency checking, some duplication is desirable.
+ Test problems should differentiate between systems. A good test problem will run well on some target systems and poorly on others. Executing a test problem which all systems treat the same does not provide useful information about whether one system is better or worse than another.
Because a limited number of systems were tested prior to this release, a test problem which all these systems treated similarly may show differences when run on additional targets. When there is a "reasonable" expectation that a problem might show differences, it is prudent to retain it in the test suite.
The results of some test problems are of independent interest; e.g., those tests associated with LRM features such as rendezvous, exception propagation, or procedure calls. Relative performance data is not always sufficient. A system may have a relatively fast rendezvous, compared to other Ada implementations, but the absolute time is also important. For example, a real time system may need to cycle every 20 milliseconds to satisfy the applications requirements. An implementation of that application which requires 100 rendezvous will not satisfy its performance requirements when the fastest entry call takes two milliseconds. If all other computations were instantaneous, the program would take 180 more milliseconds to perform the indicated synchronization than are available.
A particular test problem may be related to other problems to expose the presence of specific optimizations. One may be an unoptimizable version of another. If the first were removed from the suite, it would be difficult to compare the tests and reveal the presence of the optimization, even when the first problem may not differentiate between systems.
If all the trial systems perform the specific optimization in a comparable manner, then the two test problems will be serially correlated on all systems tested. However, adding the performance measurement results from an implementation which does not perform the optimization will weaken the correlation between the two test problems, making the two test problems useful and non-redundant.
* The properties of individual test problems.
At this level, the relevant question is whether or not the particular test problem is "well written". The answer to this question depends on the intended purpose of the problem, in addition to the characteristics of the actual code comprising it.
A test problem which can be optimized into a NULL statement is a poor problem if it was intended to expose the performance of addition operators, but may be a well written problem if the intention was to test for potential optimizations. For example, "XX := XX + 0;" would be a poor test for general addition of literal values, but is appropriate to test for algebraic simplification.
Test problems which exhibit the characteristics described below are defined as "poorly written". They are further discussed in Section 6.8 "CORRECTNESS OF TEST PROBLEMS".
+ The problem could be incorrect Ada.
This condition is sufficient to disqualify a problem. Erroneous test problems will be withdrawn.
+ The performance of the test problem could be nonrepeatable.
It might take a different computation path when it is repeatedly executed, falsifying the assumption that the timing loop can execute the problem many times, divide by the number of executions, and obtain a valid time estimate for one execution. If the test problem takes a different amount of execution time on each iteration, a sequence of timing estimates may not converge.
+ A problem intended to test one feature may actually test another.
For example, a problem designed to test passing literal values to a simple function might be expanded inline and folded, making the test problem more a test of possible compile-time optimization than a test of passing literal parameters. While it is important to test for the folding of inlined subprograms, it is also important to test for specifying literal actual parameters when inlining is neither requested nor possible.
Each test problem in the ACES has been compared against these criteria. Also of concern at this level is the accuracy of the timing measurements themselves. The steps the ACES takes to ensure that the timing loop code is accurate are discussed in depth in Section 6 "DETAILS OF TIMING MEASUREMENTS".
For the purpose of running the CA tool when a user has performance data from only one compilation system, the ACES is distributing sample data derived from running CA on the performance data from the trial systems. This represents the "average" performance of the systems tested, and can be useful in detecting differences between the one system the ACES user has data on and a hypothetical "typical" system.
There are three different but complementary ways an ACES user can examine test results. These are discussed in Section 5 "HOW TO INTERPRET THE OUTPUT OF THE ACES", and reviewed here.
* After running all the test programs on several systems, use CA to identify the test problems which have statistically unusual behavior on each system.
* Select a set of test problems which use a set of features of special interest and examine the results of these problems.
* Run the Single System Analysis program.
The VDD indexes and the source text itself provide a rich field for exploration. Depending on their interest, time, and the use they intend to put the results to, it is possible that users may form very detailed hypotheses about combinations of language features which are responsible for particularly slow performance, and the user may construct unique test problems to verify those hypotheses. Many ACES users will not be that interested or have that much time to spend. If the reason they are running the ACES is to choose between two or three compilation systems for one target processor, they may not care if they find that there are not large differences in performance between the systems. The details of the differences may not be important to them.
The important point of this discussion on examining test results is that ACES users will not have to understand the details of a thousand different test problems to obtain useful information from the results.
The Ada Compiler Evaluation System will not answer every question a user might have about Ada compilers; for example, dollar cost is not addressed. The ACES is concerned with evaluating systems, not just compilers. No attempt is made to factor out the contribution of hardware to overall performance. Some ACES users will want to compare different target architectures, in addition to comparing different compilers for the same target. This taxonomy will help to place the ACES tests in perspective and answer user questions about what assistance the ACES will and will not provide.
* Covered
+ Execution-time Efficiency (see Section 3.2 "EXECUTION-TIME EFFICIENCY").
This is the major emphasis of the ACES. Users will want to be able to examine the results of the ACES to study aspects of Ada performance. The ACES helps the user in isolating the particular tests that may be needed by providing indexes which list tests by various criteria.
+ Code-size Efficiency (see Section 3.3 "CODE-SIZE EFFICIENCY").
Code expansion size is an important area of interest for the ACES. This is measured by using the label'ADDRESS attribute. On systems which do not support this feature, users can select the "Get Address" technique as discussed in Section 5.1.1 of the ACES User's Guide to measure code expansion size in other ways, as discussed in Section 6.7 "CODE EXPANSION MEASUREMENT".
Users should be aware that measuring code size by calculating differences of label addresses may produce misleading results. Highly optimizing compilers for pipelined processors may reorder statements in unexpected ways, rendering address differences meaningless. Unexpectedly small or large code sizes should lead the user to examine the object code to determine whether code reordering is responsible.
The ACES gathers size measurements for all the tests along with execution time. While this is not a major thrust, some tests are included which measure data space allocated to objects by using the X'SIZE attribute and comparing the size of the packed objects to the minimum size possible. Sequences of allocation and deallocation in a collection (LRM Chapter 13) are included which will fail if space is not reclaimed. The tests are designed as performance tests but will demonstrate, as a side effect, whether storage reclamation takes place. These results, along with others, are reported in the ancillary data section of the SSA report.
+ Compile-time Efficiency (see Section 3.4 "COMPILE-TIME EFFICIENCY").
One area emphasized in the ACES is compile and link speed. The ACES provides tests that explore the effect of program size and various constructs on the speed.
+ Symbolic Debugger Assessor (see Section 3.6.1 "Symbolic Debugger Assessor").
The ACES provides a set of Debugger Assessor scenarios (programs and sequences of operations to perform) to enable users to evaluate a compilation system's debugger.
+ Program Library Assessor (see Section 3.6.2 "Program Library System Assessor").
The ACES provides a set of Library Assessor scenarios (programs and sequences of operations to perform) to enable users to evaluate a compilation system's library system.
+ Diagnostic Message Assessor (see Section 3.6.3 " Diagnostic Assessor").
The ACES Diagnostic Assessor tests will determine whether a system's diagnostic messages clearly identify an error condition and provide information to correct it, and whether warning messages are generated for various conditions.
+ Capacity Assessor (see Section 3.6.4 "Capacity Assessor").
The ACES provides a set of program generators that, when executed, generate programs to assess the compile-time and run-time capacity limits of an Ada system.
* Not Explicitly Covered
+ Test for Existence of Language Features.
The presence of language features is the charter of the Ada Compiler Validation Capability (ACVC). The ACES test suite assumes that the full Ada language is supported and correctly implemented. The ACES contains tests for the performance of representation clauses and implementation dependent features (LRM Chapter 13 features). (Note that for this manual, LRM will typically represent both the Ada 83 Language Reference Manual and the Ada 95 Reference Manual. When specifically distinguishing between the two, the terms Ada 83 LRM (for Ada 83) and RM (for Ada 95) will be used.) Some test problems will require modification to run on different systems (such as using the PRAGMA INTERFACE and calling on a procedure written in assembler language) and may fail on some Ada implementations which do not support the full language.
The ACES test suite contains a large number of test problems, which are collected (at link time) into a smaller number of test programs, as illustrated in Figure 3-1. A test program may contain more than one test problem. This is helpful in executing the ACES on an embedded target where downloading from the host can take a large amount of time. The time to run the whole test suite can be significantly reduced if more than one problem is downloaded at a time. This can also reduce the total compilation time, since several programs are combined into one which can share various pieces of logic (such as instantiations of TEXT_IO).
Run-time measurements are reported by test problem. These measurements include the time required to execute the problem and the number of 8-bit bytes occupied by the problem's compiled code. There are 1937 test problems that report these measurements when they are executed. (Another 25 test problems exist solely for the purpose of providing compile-time measurements; they are not executed.) Each test problem's code is embedded in measurement code that is part of a procedure. These procedures may be bound singly or in groups to form executable programs of reasonable size. This reduces load time, which is especially important for embedded systems. There is a default binding scheme, or the user may select collections of test problems to be bound together to form executable programs. (See Figure 3-1.) In any case, the measurements for each test problem are calculated and reported individually. Thus, if all test problems execute successfully, there will be 1937 execution time measurements and 1937 code size measurements.
Compile-time measurements include the time required for compilation and the time required for linking. Since the test problems are typically very small, the enclosing measurement and reporting code would be the dominant factor in compile-time measurements. Therefore, compilation time is reported as the total time required to compile all the units forming an executable "main" program, and linking time is reported as the time required to link the units into the executable. (See Figure 3-1.) In order for compile-time measurements from different evaluations to be comparable, these measurements are taken only if the default groupings of test problems into executables are used. The default grouping results in 725 executable programs; thus, if all the programs compile and link successfully, there will be 725 compilation time measurements and 725 linking time measurements. Since the distinction between compilation and linking is not clear for some systems, the Analysis programs also report the total time for compiling and linking each main program.
Each test problem is embedded in a template which, when executed, will report the execution time and the code expansion space for the problem. The template is discussed in Section 6.5 "HOW TEST PROBLEMS ARE MEASURED".
The emphasis of the Performance Test Suite is on determining the execution performance on a target system, both for speed and memory space. To a lesser degree, the ACES will also test for compilation and link speed. The tests are designed to:
* Produce quantitative results, rather than subjective evaluations.
* Be as portable as possible, consistent with the need to test all Ada language features of interest to Mission Critical Computer Resource (MCCR) applications, which are the class of applications of special interest to the ACES.
* Require minimal operator interaction.
* Be comparable between systems, so that a problem run on one system can be directly compared with that problem run on another target.
The best performance predictor for any particular program is the program itself. Only the final program will enable users to determine whether their implementation will satisfy their performance requirements. That being said, it must be recognized that most projects will want to use the ACES before their final program is available, so they can select an appropriate compilation system (compiler and target processor).
Even when a working version of the final program is available, users may find that its performance is not acceptable and want information to assist them in enhancing it. Performance analyzers (also called program profilers) are helpful in isolating areas of a program which account for most of the execution time of a program. However, this does not tell a programmer whether or not the construction used within the program's "hot spots" might be faster if coded using different language constructions. If a project is concerned with selecting which of several implementations to use on a project, this approach requires the final program be ready for testing before the selection of the compilation system. And even if a "very similar" application is available, the effort required to adapt it to each candidate system may be large, involving learning about system dependencies. A compiler implementor would not receive any information about which language features of their system are particularly slow (or fast) relative to other implementations, which they could use to help determine areas where performance improvements should be possible. Also, with only a set of final application programs, programmers have no initial guidance as to which language features or coding techniques they should try to avoid on a particular implementation. The ACES intends to provide such information. To do so, the ACES includes a set of tests which provides a broad scope of coverage of various features of the language.
The ACES is organized as a set of essentially independent test problems, assessors, and analysis tools. It is possible for users to insert additional test problems that they find of particular interest. In fact, by using ACES-defined names, users can easily add up to ten user-defined benchmarks for an Ada 83 system or up to twenty for an Ada 95 system. See Section 4. The analysis tools will process user-defined test problems, as well as ACES-developed problems. Users should consult Section 6 "DETAILS OF TIMING MEASUREMENTS" and the User's Guide for advice on test construction.
The issues addressed and not addressed by the ACES are indicated below.
* Addressed
+ Execution time
+ Code expansion size
+ Compile time
+ Link time
+ Diagnostics
+ Ada program library management systems
+ Symbolic debuggers
+ Capacity limits
* Not addressed
+ Cost
+ Tests for existence of language features
+ Adaptability to a special environment
This includes the ability to modify the Run-Time System (RTS) to fit a special target configuration, which is important to many MCCR projects. Modification to an RTS brings up validation and revalidation questions, and extensive modifications may require revalidation.
+ Presence of support tools
Cross reference listings, variable set/use lists, management support tools, compilation order dependency tools, configuration management applications, and other tools can make a system easier to work with, speed up the program development process, ease the maintenance task, and generally support the program life cycle.
Some target processors have not been explicitly anticipated in the suite of test problems, and a reader may ask how well unconventional target processors will be evaluated, or what extensions to the ACES would be needed to cover them adequately.
* Vector processors - There are test problems which can be vectorized. However, the machine idioms do not completely cover all the cases of interest. In particular, each vector processor architecture places particular constraints on what sequence of operations can and can not be processed with vector instructions. For example: vector registers may be limited in size; the memory span between elements which can be processed in vector mode may be restricted (to 1 in some designs); the efficiency of vector mode instructions on short vectors may be such that scalar processing is more efficient; some conditional processing in vector mode may be supported on some targets; special advice may have to be given to Ada compilers to permit vectorizing (particularly with respect to numeric exception processing in vector mode); and special vectorized library functions may be available (and necessary) to exploit the vector processing capabilities.
* Very Long Instruction Word (VLIW) machine architectures - Machines in this class have different machine idioms than others. It is possible that some of the specific optimization tests designed for other target machines will be falsely reported as being performed. For example, there are several common Subexpression tests where an optimizing compiler may reduce the operation count by generating fewer instructions and taking less time than a version of the problem written as two statements with the common Subexpression merged "by hand;". A VLIW processor may use several parallel instruction units to perform more operations for the "unoptimized" test problem, but still complete it in the same time.
Some care should be used in interpreting results on such a target. There are few Ada compilers available for such targets, so experience is limited.
* Reduced Instruction Set Computer (RISC) - On these processors, the only real difficulty is interpreting the test problems which probe for machine idioms. If there is no Increment Memory instruction on a target, test problems which permit its use are not directly applicable to the target machine. However, the time to increment a counter, or zero memory, or perform other operations which some machines have developed special idioms to support efficiently, are still of interest. Adding to or zeroing a counter is something which many programs need to do, whether it can be done in one machine instruction or not. The time (and space) it takes to perform the operation is the topic of real interest, not the machine instructions used to implement the operation. Special, idiomatic instructions were generated for machines because the designers believed that the instructions would be useful. However, for any particular target Instruction Set Architecture (ISA), a user may be especially interested in how well the target machine instructions are utilized: Are available idioms exploited? To answer this question, specific test problems need to be executed, and if not present, may need to be developed.
* Multicomputer - On a network of independent processors, with or without shared memory, it may be possible to process tasks in parallel for faster execution. In particular, it may be possible for a system to allocate separate Ada tasks to run in parallel on different processors. This may give interesting results, but when comparing performance between a uniprocessor system and a multicomputer, readers will need to understand the target configuration and the number of processors.
* 1750A Extended Memory - Access to MIL-STD-1750A extended memory is being provided by different implementors in different, incompatible ways. It is not the intention of the ACES to explore the performance of the different implementation-dependent extensions to provide access to extended memory on the 1750A. If the ACES tests are run as written, a baseline will be developed which can be used to compare systems. Projects requiring extended memory will generally need to modify the tests to match the extension needed for each implementation they test.
Determining the execution speed of various problems is the primary objective of the ACES.
The ACES includes test problems for all major Ada language features. One test problem will rarely be sufficient to accurately characterize the performance of a language feature. Optimizing compilers can generate different machine code for the same source text statement, depending on the context in which it occurs. The test suite contains sets of test problems which present constructions in different contexts. The results will demonstrate the range of performance associated with a language feature.
For example, consider the simple assignment statement "I := J;" where "I" and "J" are both integers.
* If "J" is a loop invariant in a loop surrounding the assignment, the statement may be moved out of the loop.
* If "I" is not referenced after being assigned to before being reassigned or deallocated, dead code elimination may eliminate the statement.
* If the value of "J" is known to be some constant based on statements executed before this assignment, it may be possible to use a special store constant or clear (if the value is known to be zero) instruction on the target machine.
* If an assignment was made to "J" earlier in the basic block leading to this assignment, it may be available in a register, permitting the compiler code generator to avoid a redundant load from memory of "J".
* If the value of "I" is needed later in the sequence of statements following the assignment, it may be possible to defer the store into "I".
* Establishing addressability to "I" and/or "J" may require the setup of a base register, or it may be possible that the required addressability has been established by other statements. This may be a common requirement if the variable is declared with intermediate scope, neither local nor library scope.
* If "I" has range constraints, various amounts of data flow analysis may be able to determine if "J" must satisfy the constraint or if explicit tests need to be generated. The performance in general will depend on whether "PRAGMA SUPPRESS" has been specified or not.
* If the assignment statement is unreachable, occurring inside the THEN clause of an IF statement determinable at compile time to be false, or directly after a GOTO, RAISE, EXIT, or RETURN statement without a label on the statement, or within a FOR loop whose range is determinable to be NULL at compile time, or within a WHILE loop determinable at compile time to have a false condition, or in a CASE alternative which is determinable at compile time not to be selectable, no code need be generated for it at all.
* If "I" has range constraints and "J" is known at compile time by data flow analysis to be outside the permissible range, the compiler may simply raise an unconditional CONSTRAINT_ERROR.
Some language features are tested with a corresponding systematic variation of contexts to expose the range of performance. Examples include: subprogram calling; task rendezvous; exception processing; arithmetic expressions; and Input/Output (I/O) operations. Although many embedded targets will not support file systems, many Command and Control MCCR applications make intensive use of file systems and the performance of I/O operations is critical to their application performance.
A set of test problems for "PRAGMA PACK" measures both time and space. Some packing methods do not allocate a component so that it will span a storage unit boundary, while some pack as densely as possible. The time to access a component which spans a storage unit is usually greater than when the component does not span a boundary. There are test problems which access a packed component of a record which would span a storage unit boundary (if densely packed), and others that access components which are left and right justified in a storage unit.
For justified components, machine idioms may be able to access the components more efficiently than for general alignments. These tests do not require tailoring to the storage unit sizes on the system under test; however, a test problem which forces a field to span a boundary on one system may not do so on all target systems. Although which component spans a boundary is dependent on the implementation storage unit size, the computation to identify the component can be performed in an implementation-independent manner using the named number SYSTEM.STORAGE_UNIT. There are test problems for the storage unit sizes in common usage: 8, 12, 16, 24, 32, 48, 60, and 64 bits. In addition to measuring the time to perform the test problems accessing packed objects, these test problems use the representation attribute, X'SIZE (LRM Chapter 13, Representation Attributes) to determine the actual bit size of the objects and compare this with the predetermined minimum possible bit size for the object. These sizes will be printed for information; they show the degree of packing the system under test performs.
Major language features are defined to be the features which are expected to require execution time and to have a reasonable expectation of portability. This includes all syntactic constructions except those listed below by LRM chapter number:
* Based Literals, LRM/RM Chapter 2
The ACES assumes that programs will be compiled, and literal values represented in an internal form at execution time. The format of the external representation of a literal value should not make any difference in execution time or space in nonsource-interpretive systems.
* Comments, LRM/RM Chapter 2
The presence of comments in test problems should not make any difference in execution time or space in nonsource-interpretive systems.
* Replacements of Characters, LRM/RM Chapter 2
The use of alternate character forms should not make any difference in the execution time or space in nonsource-interpretive systems.
* Address Clauses, LRM/RM Chapter 13
The ACES will not force users to make system-dependent modifications to test references to variables that are bound to fixed locations by ADDRESS clauses. Tying tasks to interrupts is a special case which is tested, because the performance characteristics are important and likely to be highly variable. For portability, the ACES does not include tests which use an ADDRESS clause to map a variable to a fixed location.
* Machine Code Insertions, LRM/RM Chapter 13
Test problems for machine code insertion are fundamentally tests of the underlying target hardware, rather than of any property of the Ada system. Designing a portable test problem for different target machines would be awkward, and impossible to verify since each user would have to perform the adaptation to the target machine. Even for the same target machine, different compilers may use different subprogram linkage and register usage conventions making a valid code insertion for one system erroneous on another. Some examples:
+ One system may expect that all machine registers are preserved across machine code insertions, while on another it may be permissible to modify various register values. If one compiler leaves a FOR loop index in a register and a piece of inserted machine code modifies the register, results could be peculiar.
+ It may be possible to write a piece of machine code which, when placed within an exception handler for OTHERS, examines the run-time environment provided for debug support and determines the identification of the exception which is being processed and where it was originally raised (line number or subprogram name). This could be a very useful piece of code, but is not likely to be portable.
+ The machine code to establish addressability to library units can differ between implementations; one system may allocate an object to a static (bound at link time to a fixed address) location, where another does not bind it until the package is elaborated. A piece of machine code which assumes the first alternative is performed and loads the address of the object in one instruction into a register to manipulate it will fail totally if the actual address needs to be computed by adding the base address of a package or by following a level of indirection.
+ The layout of records, including the degree of packing of fields, placement of discriminants, and the format and interpretation of descriptors for unconstrained type objects, may differ between implementations. Machine code written assuming one layout may be completely inappropriate for another.
It is possible that the text formats are incompatible between implementations (e.g., different mnemonics for the machine instructions, different placement of instruction fields).
* Interface to Other Languages, LRM Chapter 13 (RM Annex B)
Calls to other languages are implementation dependent and must be adapted to each system. There is one test problem, ms_il_interfac_lang_assem_01, which users must adapt to each target, which calls on a NULL procedure coded in assembler. As such, it is basically a test of interlanguage linkage conventions, and is by no means exhaustive.
This simple control linkage test does not answer many of the interesting questions users have about the performance of a multi-language program. Data structure layouts often differ between implementations, and special access functions may be necessary to reference data structures defined in one language when referenced from the other, or shared data may be restricted to scalar items. Measuring the performance of a simple control transfer will not expose the costs of such data access functions.
Ensuring comparability in interfacing with a NULL assembler procedure is not easy. On many target machines, there are different sets of linkage conventions, of decreasing generality and increasing efficiency. For example, on the DEC VAX the CALLS mechanism is more general, but slower than the JSB mechanism. In "real" applications which call assembler coded subprograms, the linkage convention used will depend on the details of the program design, and represents a major design decision.
Certain predefined pragmas are expected to have an impact on the execution time and space of a program. These include: CONTROLLED, INLINE, OPTIMIZE, PACK, PRIORITY (Ada 83; Ada 95 implementations supporting Annex C), SHARED (Ada 83), ELABORATE, and SUPPRESS. Others are concerned with presentation information (LIST and PAGE). There are test problems which explore the performance effects of specifying those pragmas in the first list.
The ACES requires the availability of certain math functions. Implementations vary considerably in their approach to math libraries, and modifications may be required before using the ACES. It is important that ACES users record all the modifications made to adapt the math library. Implementations may provide:
* The math library defined in Ada 95 (RM Annex A). All Ada 95 systems are expected to provide the packages specified in this Annex.
* A math library with all needed math functions implemented according to the Numerics Working Group (NUMWG) specifications, which is the ideal case. In general, the ACES testing should use a math library similar to that which will be used by projects. If an implementation-provided math library is inaccurate (even if it is fast) an ACES evaluator must judge whether the inaccuracy is sufficiently gross that projects using the compilation system will (or should) replace the provided math library.
Vendor math libraries may provide more precision than the ACES types ZG_GLOB1.FLOAT6 and ZG_GLOB6.FLOAT9 (digits 6 and digits 9, respectively) require. This is not a problem; the accuracy of the library is tested during Pretest and if observed errors exceed NUMWG recommendations a comment is made.
* No math library. In this case, evaluators may use the ACES portable math library "zm_genma.ada", provided as a generic package.
* A math library with limited support. In this case, the adaptation may be more complex because the user may want to use the existing vendor functions, and use the ACES portable version only for the missing functions. This can be done by writing a "pass-through" package which calls the implementation's math functions where available, and the portable ACES math library functions for the others.
In addition, there is the special case of an implementation which requires the ACES portable math library, but has difficulty in processing generic packages the size of zm_genma. See the discussion below for two potential workarounds.
The ACES portable math package, zm_genma is a generic package that is instantiated in package zm_math for type ZG_GLOB1.FLOAT6 and in package zm_dblma for type ZG_GLOB6.FLOAT9.
If the generic package is too large for a system to handle, the compilation system might accept the package if it were "de-genericized". Edit the files as specified in the User's Guide Section 5.1.6.1.4 "Making a Non-generic Math Package". It should be noted that comparing the performance of such a "de-genericized" version with a system which used the provided generic system may provide an unfavorable bias, in that the system supporting the generic version without requiring adaptation may run more slowly than it would have if it had been adapted. This is an example of the importance of recording all modifications.
Another approach to working around size problems in processing the generic package zm_genma is to define subprograms in zm_genma as SEPARATE units, although there is no guarantee that a system which could not process the original units of zm_genma will have any better luck in processing a version with subunits.
Specific optimization test problems include examples where it is easy and where it is more difficult to determine that the optimization is applicable. There are some test problems which perform the same basic operations, but have a modification which either performs the intended optimization in the source text, or precludes the application of the optimization. These examples permit a reader to determine if the optimization is ever performed. The system comparisons performed by CA will not distinguish between the case where all tested systems perform the optimization and the case where none did. The SSA report on optimizations should show which optimizations are performed. These test problems should not be amenable to optimizations other than the one being studied.
The set of Ada language features which, when used in various contexts, might affect the performance (time and space) of the generated code is much too large to consider exhaustively enumerating all permutations of basic features. It is necessary to be selective in the construction of the test problem suite. Sets of related problems have been constructed with variations based on the use of language features expected to demonstrate the presence of specific optimizations.
There are test problems which check for the presence of specific compiler optimizations. More detailed discussions can be found in texts on compiler writing, such as Compilers: Principles, Techniques, and Tools, by A. Aho, R. Sethi, and J. Ullman. The test suite contains problems for the listed techniques. The following subsections describe individual techniques.
This is the recognition that an expression previously evaluated need not be evaluated a second time as long the results of the first calculation are still available.
Although many compilers will recognize common subexpressions in some contexts, the better optimizing compilers do it in more contexts.
Folding is the performing of operations at compile time, including evaluation of arithmetic expressions with static operands (or operands whose values can be determined), and the folding of control structures (for example, no code needs to be generated for a statement of the form IF false THEN RAISE program error; END IF;).
This is the moving of a statement (or a part of a statement, such as evaluating some expression) which is written within a loop, but which does not vary between iterations of the loop, to a place before the loop.
Greater than linear performance gains may be achieved through this technique, since instead of executing the moved code once per loop iteration, it will be executed only once. Some care should be exercised in applying this technique, since a loop which is normally not executed at all (for example, a WHILE loop which is typically initially false) might run slower with loop invariant motion than without it. This problem can usually be minimized by checking the condition that the loop may not be executed at all before evaluating the "moved" code.
This is a replacement of a strong operator with a weaker but potentially faster one. For example, "multiply" can be replaced by an "add" in the expression "I * 2" resulting in the expression "I + I". A frequent and profitable application is the reduction of FOR loop indexes used as subscripts in arrays (which produce multiplication of the index by a constant reflecting the span between logically consecutive elements of one dimension of an array) with addition.
An assignment to a variable which is not referenced again before being either reassigned or deallocated is dead and need not be actually performed. While there are instances in which data is legitimately written to a memory location and never read (such as memory mapped I/O), these cannot happen inside Ada unless the compiler knows that a variable is tied to a particular memory location by an ADDRESS clause (see LRM 13.5; RM 95 13.1).
On machines with multiple general purpose registers, the registers serve as temporary storage which can save the values of expressions between the execution of the simple statements. If used efficiently, significant reductions in the number of instructions required to execute code are possible.
This is the process of switching inner and outer loops. It has a profound effect on the order of execution of statements, and can produce large performance gains.
The code can be transformed to contain more unit-stride array references. That is, the difference in memory addresses between references to an array on consecutive iterations is one. This can permit, or greatly enhance, operations on vector processors.
It can change the number of instructions executed on non-vector processors. For example, when an outer loop is performed 1000 times and an inner loop 5 times, there will be 1001 loop initiations (1000 for the inner loop and 1 for the outer loop). If the loops were switched, only 6 loop initiations will be required (5 for inner loop and 1 for outer loop).
Loop interchange may permit a reduction in paging on virtual memory machines by reducing the size of the working set.
Refer to Optimizing Supercompilers for Supercomputers, by M. Wolf for more discussion.
This is the merging of loops with equivalent bounds. This can reduce the amount of loop overhead in a program.
This is the combination of tests, best demonstrated by example. Consider the code:
IF I > 0 THEN ... ELSIF I = 0 ... ELSE ... END IF;
On most machines, the comparison for "I > 0" will set condition codes which can be retested for the condition "I = 0"; that is, the generated code could be: "..., compare, branch_less_equal to $1, ... $1: branch_not_equal $2, ... $2: ..".
A boolean expression which has a compile-time determinable operand can be simplified. An expression which simplified to "FALSE AND I=J" can be further simplified into "FALSE". DeMorgan's rule can be applied to help simplify boolean expressions.
Various algebraic identities can be used to simplify expressions. For example, it is not necessary to actually multiply by one or add zero.
Evaluating expressions in a non-canonical order often makes it possible to compute the results faster. This can reduce the number of temporaries, permit special form immediate operations, or permit register-memory operands rather than loading a value from memory followed by a register-register operand.
This is also known as "branch tracing".
A jump instruction which goes to another jump can often be simplified into a single jump. For example, when a conditional jump skips over an unconditional jump, the instruction may be rewritten as one conditional jump using the inverse condition.
Programmers will not often write a GOTO which branches to another GOTO. However, a straightforward translation of conditional control structures often produces such sequences of branches.
Code which is not reachable can be eliminated. Statements following an unconditional EXIT, RAISE, GOTO, or RETURN cannot be reached and need not have any code included in the memory load. When the compiler can determine that an IF statement condition is always false, the THEN alternative need not have any code generated for it. Some may argue that since well written programs should not have this type of unreachable code, having a compiler which optimizes it is not very important. However, with optimizing compilers performing both inline expansions and generic instantiations, there are more conditional expressions which can be resolved at compile time than is immediately apparent.
A procedure in a library package which is not reachable from the main program can be eliminated from the load. The reuse of common packages is expected to make this fairly common, since not every program will use every subprogram in a reusable package. For example, few programs will use all the subprograms defined in TEXT_IO.
The performance of some test problems can be enhanced by the good use of special properties of target machines. Examples of such idioms include: special instructions to clear memory; store of small constants; reuse of condition code settings to avoid retesting; use of special "loop" instructions; compare between limit instructions; register usage; add-to-memory instructions; loop instructions which update a counter and test against some limit; memory-to-memory block moves; and memory increment and/or decrement instructions.
If an array will fit within a single target computer word, the compiler may use fast and small inline code to perform various logical operators. For larger and/or dynamically determined sizes of packed arrays, a run-time support library routine will usually be called at execution time to perform the logical operation. This can be considered as a type of machine idiom, using the capabilities of the machine.
The LRM defines several pragmas which can affect the performance (time or space) of the generated code. These include CONTROLLED, INLINE, OPTIMIZE, PACK, PRIORITY (Ada 83; Ada 9X implementations supporting Annex C), SHARED (Ada 83), SUPPRESS, and ELABORATE.
The LRM also permits implementations to define additional pragmas, some of which may affect performance. The ACES is designed to study the performance of Ada implementations as defined by the LRM. It does not contain test problems for all possible implementation-defined pragmas. However, individual users are free to modify the test problems to incorporate additional pragmas and observe their effects. They should execute the tests as distributed so that they may compare results between systems; however, they may be interested in exploring the effects of implementation-defined pragmas. For a selection process, it would be reasonable to experiment with a few problems to find the setting of pragmas giving the best performance, or even better, to observe the effect of the pragma setting planned for use on the project.
It is sometimes possible and profitable to elaborate an object before it is required in a "canonical" order. For example, a library package containing a constant array with static bounds initialized with literal values might be elaborated and initialized before the program is loaded. That array would then not require any overhead when elaborating the package. Depending on the other contents of the package, it may not be necessary to execute any code to elaborate the package. In general, if the bounds are not static, it will be necessary to execute the expressions providing the bounds, at some time (not completely specified by the LRM) after the program is loaded but before any objects in the package are invoked (subprograms, variables referenced, or types used).
Objects initialized with a static aggregate in the declarative region of a library package will, by definition, only be elaborated one time. This makes it impossible to use the normal ACES timing loop to measure the speed of elaborating library units. The special technique used to measure library package elaboration is discussed in Section 3.2.7.5 "Elaboration".
Frequently, arrays (or records) are initialized with an aggregate which can be evaluated at compile time, often with literal values specified. This permits several optimization techniques to be applied.
One approach to aggregates is to consider them as a sequence of individual assignments. However, for a static aggregate, it may be better to define a copy in memory and do a block move into the object to be initialized; block moves are often much faster and shorter than a sequence of load and store instructions. If a static aggregate is used to initialize a CONSTANT object (or if the compiler verifies that no assignments are made to the object), an optimizing compiler may make all references to the object refer to the pre-allocated static block, and save both the space for the object on the stack and the time to copy the initial values into it.
Many application designs which use tasking can use a static task structure, creating all tasks when the program is initiated and keeping them active until the program terminates. If an optimizing compilation system, and particularly an RTS, recognizes this, it may be able to save a significant amount of space and time. Space can be saved in the RTS because the internal routines associated with task creating and termination will not be necessary. Time can be saved in two areas. First, the time to create the tasks can be saved. This would be done once on program initiation and is not likely to be significant. Second, the RTS routines associated with rendezvous could be replaced with versions which do not test to verify that a task exits. If the number of rendezvous is large and the RTS is organized so that replacement is feasible, the time savings may be significant.
An optimizing compiler may be able to determine that a task is static either through:
* User declaration, via an implementation-defined pragma; or
* By analysis, observed during compilation and linking that all tasks will be elaborated at the library level.
The definition of Ada tasking provides scope for some optimizations which have no analogue in more traditional languages. This section is entitled Language-Specific Optimizations, although the techniques described here may be applicable to other languages which support facilities similar to Ada.
The Habermann-Nassi transformation for tasking is a technique to reduce the number of task switches required to execute a rendezvous. It does this by executing the code of the rendezvous in the stack frame of the calling task, rather than in the frame of the entered task, when the entering task is ready to accept a rendezvous when the entry call is made.
Habermann observed that many of the tasks that arise in practical applications are of the "server" type, consisting of one or more SELECT statements enclosed in a loop. This structure permits an optimizing compiler to implement the rendezvous by replacing the ACCEPT statement linkage with a less general but more efficient subroutine which implements the required mutual exclusion and synchronization. A more detailed discussion of the approach is contained in the article "Task Management in Ada - A Critical Evaluation for Real-Time Multiprocessors", by E. Roberts, A. Evans, C. Morgan, and E. Clarke, Software: Practice and Experience, Volume 11, Number 10, October 1981.
Elapsed time measurements of DELAY (including DELAY UNTIL) statements requesting positive delay values will not follow the ACES product model for analysis (refer to Section 7.1 for details). It would not be desirable if one system, which is typically twice as fast as another, executed a DELAY statement twice as fast, as to do so would require execution time shorter than the requested DELAY value. Comparing execution times of DELAY statements is also complicated by system dependencies because the precision of the requested DELAY value must be expressed in terms of the system-dependent type DURATION; a request for a 100 millisecond DELAY can be interpreted very differently by different systems because of truncation and rounding required to convert the value into the proper type.
All DELAY test problems are given a special error code to prevent them from being processed as "normal" test problems by CA. The measured execution times for test problems executing DELAY statements are printed for examination. Because the actual elapsed time to complete a DELAY statement can be much larger than the requested value due to quantization and scheduling overheads, this information can be of interest to application programmers.
The measurement of statements with a zero delay value are of particular interest because they can provide insight into system task scheduling and could be given special treatment. These tests are expected to highlight differences between compilation systems.
It is permissible for a compilation system to optimize a literal DELAY 0.0 into a NULL statement. The Ada Uniformity Rapporteur Group (URG) has recommended that implementations consider a "DELAY 0.0;" statement as a scheduling point. In particular, this would require an implementation to determine whether a task has been made abnormal (that is, aborted by another task) and if so, to terminate it. For Ada 95 implementations, such a statement is an abort completion point; for implementations supporting Annex D, it is also a scheduling point.
Individual ACES test problems are designed to determine whether:
* The system treats a "DELAY 0.0;" statement as a NULL statement.
Because it is a permissible interpretation, the ACES should not treat a system which does this as erroneous. Systems which do not treat this as a NULL statement could still provide special handling for a delay statement with a literal zero, which is faster than when the delay value is not determinable at compile time.
* The performance of a "DELAY 0.0;" statement is different depending on whether there are multiple tasks or there is only one active task in the system.
* The system recognizes a "DELAY 0.0;" as a synchronization point for abort statements.
A system which translates this into a NULL statement at compile time might "miss" some synchronization points a programmer thought were present in a task. For a task with a DELAY zero in a loop, it could mean that after the task was made abnormal (that is, aborted by another task) the time until it was terminated could be indefinitely postponed.
* A zero DELAY will force a task switch between equal priority tasks.
By comparing the number of task switches in a series of long running problems, the ACES can determine:
+ Whether the system is using a run-till-blocked or a time-slicing task scheduler. This is tested in problem dt_dp_delay_zero_06x.
+ The time quantum on systems which use a time-slicing scheduler. This is the time interval when the task scheduler will switch between equal priority tasks. Many implementations provide system-dependent features which permit a user to specify the task-specific schedule algorithm:
- A pragma, as in DEC Ada.
- A linker directive, as in ALSYS on the Apollo.
- A system-provided subprogram which can be called at execution time, as in the TLD compiler.
The task-switch time can be inferred by running the same problem with different task scheduler directives for equal priority tasks, such as run-till-blocked or time-sliced with a short quantum. It will be the difference in measured times divided by the difference in the number of task switches.
A NULL statement or a sequence of NULL statements might generate some code. There are test problems to explore this.
st_nu_null_01 contains a single NULL statement.
st_nu_label contains a sequence of labeled NULL statements. It would be interesting to know whether a system generated code for each occurrence of a label.
There are some language constructions which display non-uniform performance. The more of them in a program, the slower the average performance. The simplest example is program size. Some compilers will not generate optimized code as well for the same arithmetic expression in the last statement of a ten thousand statement procedure as when it is the last statement in a ten statement procedure. Fixed size internal compiler tables for optimization can overflow and the compiler simply stops trying to generate optimized code.
Other examples are in the following sections.
The time to perform a rendezvous might degrade as the number of tasks in the system increases.
The time needed to access a variable can vary with the lexical level at which it is declared. If a "static-link" approach to accessing objects in an intermediate (non-local, non-global) scope is used, the time to reference variables will vary based on the number of links which must be followed (and on whether registers have been set up by prior statements containing the values of these links). A "display" will provide essentially constant access times, but the overhead to maintain it on block entry/exit can be higher than with a static-link approach. For more discussion on displays versus static-links, refer to any of several textbooks on compiler construction, for example, Compilers: Principles, Techniques, and Tools, by A. Aho, R. Sethi, and J. Ullman.
The performance of control structures, particularly FOR loops, can vary with the level of nesting. Some compilers try to dedicate registers to hold FOR loop indexes, and as the level of loop nesting increases, the number of available registers decreases, and the time to load and restore environments increases on subprogram calls.
Some implementations try to make access to simple scalar formal parameters quick when there are few of them, passing them in registers, for example. Therefore, access to the first (or last) few formal scalar parameters may be significantly faster than access to other parameters. Also, calling subprograms with only a few parameters may be much faster than calling subprograms with many parameters. Input parameters with default values also need to be tested.
Many target machines have different formats of instructions to be used with different displacements. Code to access a variable which can be reached with a short displacement from a base address can be shorter and faster than code to access a variable requiring a long displacement. When a compiler allocates variables in canonical order, the time to access a variable declared at the beginning of a declarative region might be faster than the time to access a variable declared at the end of the declarative region. An optimizing compiler might allocate variables to memory so that variables frequently referred to are accessible with a short displacement instruction.
Similar effects might be present with respect to access to fields of records.
In many areas of the language, it is possible to speed up the performance of one feature at the cost of slowing another down. One classical example common to block-structured languages is the trade-off between a display and a static chain for access to intermediate lexical-scoped variables. Here a static chain approach trades off faster subprogram linkages for slower access to intermediate scoped variables. Other examples are discussed in the following paragraphs.
There are times when a programmer must make design decisions but the information to determine the best choice for a particular program is not available. Alternately, there are design issues where a programmer may want to know how a system implements various constructions so that a decision to use or avoid the language construction can be made on the basis of quantitative information.
Order of evaluation is discussed under classical optimizations. See Section 3.2.4.1.12 "Order Of Expression Evaluation".
Records with default initializations are provided for in the Ada language. If an explicit initialization is specified for all occurrences of the record type, a good compiler would not have any extra code associated with the default initialization. Where there is extra code generated, users may wish to avoid giving defaults.
In a SELECT statement with several open alternatives, the LRM does not specify which is accepted (unless, in an Ada 95 implementation, a non-default entry querying policy is specified). Several test problems are presented which have multiple open alternatives to see if some implementations have adapted particularly fast (or slow) algorithms to perform the arbitrary selection. They also report on the method used to select open alternatives (lexical, priority based, a "roving" order giving priority to a different alternative each time the SELECT statement is entered, First-In-First-Out, Last-In-First-Out, or some other unanticipated order).
The relative time to access variables declared at library level, local, and intermediate lexical levels can vary between implementations. For intermediate lexical levels, the use of a display, as discussed in Section 3.2.5.2 "Levels of Nesting", can make access times roughly constant; however, this can slow down subprogram linkages. Access to variables declared in the parent unit of a separate subunit may be slower than what would be observed if the source text were textually included in the parent.
Looping constructions can be coded in alternate ways. A WHILE loop can be rewritten as an equivalent simple LOOP with an EXIT WHEN or IF ... THEN EXIT, or as IF ... THEN GOTO ... statements. There are test problems that explore the performance of these variations.
A CASE statement with a dense range of alternatives can be simply and efficiently implemented with a jump table on most current machine architectures. When the range between the first and last alternates is large, a jump table approach can result in very large memory usage, perhaps exhausting the addressable memory of the target machine in one statement. Therefore, "sparse" CASE statements need to be translated as a sequence of tests. An implementation may choose to perform all CASE statements with a sequence of tests. Users will want to know the performance of both dense and sparse CASE statements. The ACES contains examples of each.
A packed array of a subtype might be stored as an array of the base type, or with a size which permits tighter packing. A subtype may require range checking which would not be necessary for objects of the base type (because all possible values are valid). An optimizing compiler may be able to avoid some of the range checks which would appear to be required.
Generic units can either be shared or treated as templates for macro expansion. When a generic unit is instantiated several times, a macro expansion approach can result in more space than a shared code approach. For macro expansion, depending on the actual generic parameters, it may be possible to reduce the time and space of the generated code (for example, by folding or eliminating unreachable code, which is only unreachable with the particular actual parameters specified).
The objective for this set of problems is to examine the performance of tests which elaborate generic packages defined by TEXT_IO. Because of differences in approaches to processing generic instantiations, these test problems are expected to highlight differences between implementations.
TEXT_IO is a predefined library package. It is not generally possible for a user (including the ACES test suite) to modify TEXT_IO to insert code to force time stamping to be recorded; neither is it feasible to produce multiple versions of the package so that each can be elaborated.
It is feasible to measure the time to elaborate instantiations of the generic packages defined in TEXT_IO such as FLOAT_IO, INTEGER_IO, or ENUMERATION_IO.
Because they might be treated differently, there are test problems which contain:
* Sharable and non-sharable instantiations.
* Instantiations performed in library units and in nested units.
It is possible that a compiler will implement an option to perform INLINE substitution and simplification on subunits. The presence of subunits can interfere with optimizations of the parent unit because the compiler must assume that the subunit may modify any or all objects to which it has visibility.
One common approach to handling exceptions for faults detected by hardware, such as divide by zero or numeric overflow, will execute instructions on the entry and exit of each frame to track the exception handler which should be called if an exception is raised. Another approach builds a table of ranges so that when a machine fault is detected, the table can be searched to find the appropriate exception handler.
In one case, there will be additional overhead on each frame entry, but comparatively quick selection of the appropriate handler when a fault is raised. In the other, there will be no overhead on frame entry, but slower processing to find the correct handler after a fault is detected. It is expected that raising an exception will be a relatively rare occurrence, compared to frame entry, and that the trade-off will usually favor the second approach. Users may want to know which approach has been used by an implementation.
The test suite contains test problems constructed to reflect the difference that coding styles (computing the same results through different methods) can make. Examples include the Polynomial Coding Style subgroup of the Applications group (ap_pc) and the If Coding Style subgroup of the Statements group (st_is).
The test suite contains problems addressing those aspects of an Ada RTS which traditionally have been the province of operating systems. Examples include task processing (scheduling, DELAY handling, task creation and termination, aborting) and interrupt processing (obtainable by tying task entries to interrupt addresses); the ability to process hardware exceptions (such as dividing by zero); low level I/O and representation clauses tying objects to specific addresses (for memory mapped I/O or for hardware diagnostics); memory allocation and deallocation (using "new" operators and UNCHECKED_DEALLOCATION); elaboration order (concept not commonly made visible in languages other than Ada); and run-time checking (where Ada provides facilities for checking to be controlled and suppressed, if requested).
The test suite contains problems that explore task processing. This is discussed in Section 3.2.4.3.2 "Tasks".
The test suite contains problems addressing exception handling. This is discussed in Section 3.2.6.1.11 "Exceptions".
I/O tests are described below.
* Tests for asynchronous I/O
On some systems, any I/O operation will halt the program until the I/O completes. The Ada Uniformity Rapporteur Group has recommended that "A TEXT_IO.GET operation from an interactive input device, such as a keyboard, not be permitted to block other tasks from proceeding while waiting for input. Other types of input/output operations should allow the maximum feasible level of overlap, but it is recognized that in some systems, a general implementation of input/output overlap may be unfeasible". This issue is not one of correctness; the LRM does not require, or even discuss, the question of I/O operations blocking other tasks. Developing a portable test problem to determine this question is complicated because if the system halts waiting for I/O to complete, the test problem may not terminate.
The ACES contains test problems which measure the performance of operations in one task while another task is waiting on console input. When run on a target which blocks a program whenever a task waits for I/O completion, this test problem will not terminate until the user enters characters on the console. It is possible that having one task waiting for I/O will slow down the execution of other tasks (without necessarily halting them). This is revealed by comparing the results of executing the same code with and without another task in the system waiting for I/O.
* Tests for console I/O
Performance of console I/O is not intrinsically correlated with file I/O performance. In conventional operating systems, they will be routed to very different device drivers. The speed of console I/O is important to some applications and needs to be independently tested. It is not feasible to test console input without using special test equipment, since those tests would essentially measure operator typing time. These test problems need to be executed interactively, not as a background batch job because they are intended to measure the performance of console output. The problems are implementation dependent. It is possible that not all target systems will support console output, or that not all ACES users will be interested in console output performance. To work as intended, the target system must interpret an ASCII carriage return control character in a output string as affecting the cursor. The LRM does not require this; therefore, these test problems are not necessarily portable.
On many terminals, the time to display a string is a function of the characters to be displayed. The screen management routines of some target systems maintain the current contents of the display and compute the minimal set of terminal commands to modify it into the desired display. For example, a PUT of a string to a line already displaying the same string could have the result that no physical I/O commands are passed to the terminal. Many terminals have special commands to insert characters (sliding the old contents); delete characters; write blanks from the current cursor position to the end of the line; and overwrite characters. Since the time to transmit a character (either data or control) to a terminal can exceed a millisecond, reducing the number of characters transmitted can be a profitable optimization.
Some systems have bit-mapped terminals where the time to display a line is not correlated to the characters in the line or the old contents of the line.
These test problems reveal whether the time to display lines varies greatly with the lines being displayed or the prior contents of the line. To enhance repeatability of measurements, the contents of a line will be the same each time the string is displayed. The test problems first write a line of blanks, then write two strings, in order to observe the time to change between the three display strings. The blanks are written first because on some systems the time to blank a line will be constant and in many of the test problems, the first and second strings are the same, permitting users to attribute differences in test problems to changing between the first and second string.
If the console test problems only evaluated examples where the same string was being redisplayed, it would give an overly-optimistic impression of the performance of systems which omit physical I/O when re-displaying the same string. It would give an overly pessimistic impression of such systems if all the test problems displayed different graphic characters in every position.
* Tests for I/O patterns
Many classes of applications are I/O intensive. Their overall performance is determined primarily by the performance of the file system. The hardware characteristics of the file devices and the efficiency of the target operating system's file processing routines are the primary determinants of the speed of these problems. Disk caching is critical to file I/O performance. Projects which develop I/O-intensive applications will be concerned with performance of I/O primitives, and will not be particularly interested in knowing how the execution time is partitioned between the Ada run-time system, the operating system, and the physical device.
The LRM, by the FORM parameter on OPEN and CREATE procedures, provides a way for a program to specify implementation-dependent options for an external file. A null value is portable and requests the system defaults. On many target systems, specifying non-null values can greatly enhance performance. To ease portability, the ACES generally uses null values for the FORM parameter; however, the defaults will not necessarily result in the best performance. Programs concerned with file I/O performance on a particular target are advised to explore the performance differences resulting from specifying non-null values. Examples of such options which may have performance impacts include requests for multiple buffering, read-after-write checking, read-ahead, contiguous allocation on creation, a file's device name, shared vs. exclusive access, and so on. Time reduction factors on the order of one hundred are commonly achieved by using optimal FORM strings versus the default. The default settings for different compilation systems for the same target may not be comparable.
All the following cases consider records of 100 bytes (or variant records with a maximum size of 100 bytes). Resulting file sizes range from 100_000 to 10_000_000 bytes. On many systems, the speed of operations on small files is significantly different from large files, and it is necessary to include some tests problems on large files. By Management Information System (MIS) standards a 10 megabyte file is not very large, but it is not trivially small either, and it is not realistic to expect all installations interested in running the ACES to have much more free disk space than this.
The ACES contains a set of test problems which produce disk access patterns either typical of important classes of applications or potentially optimizable by common techniques. The set will demonstrate both "typical" behavior and the presence of optimizations.
Typical access patterns include:
+ Sequential
Processing a file in sequential (ascending or descending) order is a common operation in many applications.
+ Random
Some processing can be characterized as stochastic, following different distributions. The file size makes a major difference in these patterns.
- A uniform distribution may be a reasonable approximation to typical access patterns for some applications.
- Several empirical studies have observed that the frequency of access to records in a file corresponds to the 80-20 rule: 20% of the records account for 80% of the activity, and that smaller partitions of the file follow the 80-20 rule recursively. A pattern of references which correspond to this observation will exercise a system in a typical fashion. The direct application of this pattern corresponds to a hashing on the primary key reference, or if records were accessed through a B-tree, the actual pattern would contain frequent references to the directory pages. Because files are rarely organized precisely in frequency order, a model access pattern based on this distribution should map the frequency-based order to a random permutation of the physical order.
+ Cyclic
Many actual reference patterns display a cyclic pattern. For example, an indexed file being used for keyed access will show a typical pattern: Read the root; Read one of the first level directory pages; Read one of the second level directory pages; ...; Read a leaf page. The number of different pages at each directory level is fixed. The pages in the lower levels are referred to frequently. For a four level tree, the root page may be referenced every fourth logical I/O operation, and one of the first level directory pages (of which there will be a few dozen different pages) will also be referenced every fourth operation. This pattern can be exploited by locking the root and perhaps the first level directory pages in memory, or more flexibly by using a disk cache.
+ Combinations
The sequential, random, and cyclic patterns can be combined in different ways. For example, a program may be reading and comparing two sequential files; or a batch update program will process two different files, reading one (the update file) sequentially and making a random access to the second (assuming the master file is hashed).
If disk I/O systems had timing behavior like Random Access Memory (RAM), it would not be necessary to consider patterns of references. Disks have seek times, rotational latencies, and generally much slower access (compared to RAM). There are software techniques to compensate for the disk performance which are effective for some types of access patterns. Common optimization techniques are:
+ Allocating logically contiguous pages of a file to physically contiguous sectors of a disk. This is particularly effective for sequential access patterns because it permits a system to: perform multisector I/Os, transferring a full track at a time minimizing delays for seeking and for latencies; minimize disk head movement when accessing "close by" logical pages of the file; and simplifying and speeding the mapping of logical pages to disk sectors. On some systems, including many UNIX implementations, the file system maintains a directory mapping logical pages of a file to physical disk sectors; and for large files this mapping information is large and forces a (virtual) disk access to the extended file map to refer to logical pages with "large" page numbers. A disk cache will often permit the system to refer to mapping pages without a physical I/O. While this enhances performance, it occupies cache space which would otherwise be available for pages of user files. Other file system designs based on extents can provide a faster mapping between logical file pages and physical disk sectors by allocating blocks of contiguous sectors.
+ A disk cache in memory can speed access by replacing physical disk I/O operations with references to the copy in memory. This can be very effective for access patterns with small "working sets". Examples include directory pages of the file system and high level pages of an indexed file.
+ Specifying multiple buffers on sequential disk files in the same way as is done on tape files, can overlap I/O and processing and reduce the elapsed time required for problem execution. Multiple buffering is applicable independently of contiguous physical allocation; a system can initiate the reading of the next logical page of a sequential file without this page being the next physical sector. When used in combination with contiguous allocation, both techniques will work better.
+ Specifying large blocks on sequential disk files, in the same way as is done on tape files, speeds processing by reducing the number of physical I/O operations required to process a file. Reducing the number of physical I/O operations saves time in the Central Processing Unit (CPU) (the device drivers are executed less often) and a multisector read will not insert a complete disk revolution between sectors as might well happen if two consecutive single sector read requests were issued.
+ It is well known in the business data processing community that batching together a set of updates and sorting them in the same order as the master file can improve performance. Physical I/O operations might be avoided by processing updates to records on the same disk sector with one operation. Checking the contents of a single buffer is sufficient to ensure this. The head movement associated with processing the master file can be minimized by sorting the updates. Instead of each read referring to a random location in the master file, the sorted list of updates would specify a monotonic increasing sequence with the average distance between specified records being approximately the master file size divided by the size of the update batch. The optimizations appropriate to sequential files, such as multiple buffering, may also be applicable here. For large update files, update processing can be viewed as a merge-like process of reading the update file and the master file and searching for matches.
+ The setting of implementation-dependent options can impact performance. For example:
- Access control - Shared access to a file by concurrent users will involve overheads to ensure consistent usage. When a particular file is open reserving exclusive use, processing should be faster because the file system will not need to manipulate record locks.
- Allocating files which will be accessed concurrently to different physical drives and/or onto different disk controllers can minimize head movement and channel contention. Not allocating high frequency access data files to the same device where the operating system files are loaded can also reduce contention. Although the best performance for a program may result if every file is allocated to a different physical disk, many installations will not have enough different devices to do this, or may determine that the best installation-wide performance will be achieved when every program does not allocate files on every disk (file backup is simplified if one project's files are allocated to one device).
- Journaling is the recording of file activity to keep audit trails. It is necessary in some applications that such trails be maintained, and some operating systems may provide for automatic journaling, but it does have a performance cost which can be large.
- Read-after-write checking is often a user-selectable option, permitting the trade-off between performance and the surety resulting from verifying that a disk write was correctly completed.
- Some systems provide an option to assign a file to memory. When available, this can eliminate all physical I/O operations except for initial load and final save, although if loaded into a virtual memory system, there may be paging I/O operations associated with the "RAM FILE" depending on the available physical memory and the load on the system. But the overhead for this I/O is (one hopes) smaller than for disk resident files.
- There may be an option to selectively enable or disable a "write-through" option on a disk cache. When enabled, a physical write would be performed on the disk (and the cache) and the users would be assured that the disk file had been updated in the event of a system crash.
FORM strings are implementation dependent. Programs using default FORM strings will be portable, but are dependent on the implementation (and perhaps the installation and/or the characteristic of the account the program is executed under). Two different compilers which generate code runnable under the same operating system may select different defaults. It is important that users understand that specifying explicit FORM strings on some systems can have very large performance effects (orders of magnitude difference). A project which is seriously concerned with file system performance might establish coding conventions which require all programs to specify FORM strings and may be completely uninterested in the performance obtained by using default values.
These test problems construct and use several test files. These are instantiations of packages SEQUENTIAL_IO and DIRECT_IO for a 100 byte record type. Several sizes of files are used, ranging from 100 records (a 10 KB file) to 100_000 records. The test program generates these files during its initialization. The size is an adaptation parameter so that interested users may explore the performance of different sized files. For many small configurations, finding this much free space will be awkward. Relatively few configurations have ten megabytes of buffer space or disk cache, so the test problems should measure disk performance. Organizations developing applications for large systems should explore performance on files as large as they plan to use.
There are performance tests which focus on memory management issues. Some test the objects explicitly allocated and deallocated by a program via NEW and the instantiation of UNCHECKED_DEALLOCATION. A large set of tests focuses on the management of implicitly allocated objects which are discussed in more detail in the following paragraphs.
Many systems create temporary objects at execution time for Ada statements which do not explicitly contain an allocator. For example, functions returning an unconstrained type will typically allocate space on the heap to contain the function value. Two issues follow from this observation:
* First
Because the performance of storage allocators can vary greatly between systems, test problem results which invoke allocators can have results which vary greatly between systems.
* Second
If the memory space for these objects is not reclaimed for later use, available memory will "leak" as the program runs, and eventually space will be exhausted and the program will crash. This is a particularly nasty problem because:
+ The program source looks correct, and may work flawlessly on other systems, including earlier releases of the same compilation system. Software that was designed for reusability and portability is especially sensitive to this problem.
+ The existence of a problem may not be discovered until late in a project life cycle, perhaps only after the system is made operational. On target systems with gigabytes of virtual memory, it may take a long time to exhaust available memory; not so long that it will not eventually crash, but long enough that the application can pass most testing.
+ If users have suppressed checking, very strange behavior, such as an operating system crash, may result when the system eventually exhausts free space.
An implementation which can execute an Ada statement once but fails when it is executed repeatedly due to failure of implicit storage reclamation is not very robust. A compiler that does this can be validated, but the ACES will downgrade it during evaluation.
There are ACES test problems which:
* Detect if a system allocates and does not reclaim storage. In these cases, error reports against the system should be generated.
* Demonstrate the efficiency of a system in executing statements which (might) implicitly allocate and should deallocate storage. Some systems may be able to perform some of the test problems without allocating from temporary memory. That does not invalidate the test problems because it is desirable to have test problems which demonstrate performance differences between systems.
Because of differences between systems in their efficiency in manipulating temporary memory objects, and the possibility that some systems will be able to support some of the test problems without requiring dynamically allocated temporary memory, these test problems are expected to highlight differences between implementations.
Test problems use a simple strategy to test for storage reclamation. Each suspected language feature is executed 100_000 times in a test problem. If each execution allocates and does not reclaim as much as a few bytes, the space usage will quickly grow and exhaust space on systems without a large amount of usable free space. This will be effective at finding faults on the non-virtual memory systems typical of embedded applications. If the execution of these test problems exhausts space and raises STORAGE_ERROR, then the test problem will process the exception and report the STORAGE_ERROR (and not give an execution-time measurement).
The elapsed time to execute 100_000 Ada statements for some of these tests could be excessive (that is, many hours to run one problem). The problems make a preliminary estimate of the execution time for 100_000 statements and do not attempt it when the estimate is larger than a user modifiable time limit (ZG_GLOB3.EXCESSIVE_TIME).
Any other ACES test problem might also have the side effect of exhausting space when it is executed repetitively inside the timing loop, if it contains a language feature for which the compilation system allocates and does not reclaim space. Such behavior would be reported as a run-time error.
Elaboration in Ada occurs in several contexts which are significantly different with respect to performance, although not with respect to syntax.
For subprograms and blocks, entry into the declarative region implicitly invokes the elaboration of their declarative regions. Testing for this is extensive.
The test suite contains problems which elaborate nested (that is, non-library) packages and which perform sequences of statements similar to those which would be performed by a library package elaboration (that is, calling on explicit allocators to get dynamically determined space).
Several problems use a modified version of the timing loop to measure library package elaboration time. These versions declare multiple (25) library packages with an elaboration order defined by USE clauses and PRAGMA ELABORATE which forces a linear order of elaboration. Using this order, it is possible to place code in the initialization block of the package bodies to perform timing measurements. Although the error bounds obtainable in this way are not nearly as tight as those obtained from the normal timing loop, they can inform the reader of the order of magnitude of the time necessary to elaborate library packages. The elaboration is done so that it is possible to perform some consistency checks.
There are test problems to measure the cost of verifying that the predefined constraints are satisfied.
Some problems contain the same source text where the only difference between the problems is the presence (or absence) of suppression pragmas. Comparison of these problems would reveal possible performance savings by specifying suppression of checking.
Other sets of test problems are designed to test specific aspects of constraint checking code. Examples include: special cases of range checking, such as assigning a literal (whose range is verified at compile time) to a subtype variable; assigning an expression to a variable with a subrange (where data flow analysis can determine that some run-time bounds checking can be suppressed); and suppressing an access check (when the prior statement performed the same check).
Certain test problems are designed to highlight implementation differences in constraint checking and control flow analysis. The test problems which observe optimization of elaboration checks contain several calls on subprograms defined in an external package. An optimizing compiler should be able to combine some of the checking code which verifies that the package body has been elaborated before calls on subprograms defined in the package are performed. These test problems must be compiled without suppressing checking for predefined constraints because they are intended to test the quality of the code generated to perform the checks.
A system must verify that a package body has been elaborated before it is proper to call on a subprogram defined in that package. By using the predefined PRAGMA SUPPRESS (ELABORATION_CHECK), it is possible to avoid all checking code when the system honors the request. However, when no suppression pragma (or comparable compiler option) is specified, an optimizing compiler still has available more efficient options than generating testing code before every call on a subprogram defined in an external package. For example, it might apply some data flow analysis and only generate one test for pre-elaboration of a package in a region independent of how many different calls on subprograms in the package are contained in the region.
The test suite contains test problems representative of how Ada is being used in practice. In most programs, a small fraction of source text accounts for a large fraction of the execution time of the program. The test suite contains examples of such time-critical sections of code extracted from MCCR applications. The performance of an Ada compiler on these examples will be a good estimator of the expected performance on a similar program. These examples were selected to represent typical Ada usage. Neither code complexity nor use of specific language features was a selection criterion. Using such examples could result in test problems which look like "Fortran with semicolons", but if that is the way the language is being used in practice, then the ACES should contain test problems representative of this usage.
The following subsections discuss, in turn, classical benchmarks, Ada in practice, and "ideal" Ada.
The test suite contains classical benchmark programs coded in Ada. Examples include Whetstone, Dhrystone, Livermore Loops, Ackermann's function, GAMM, sieve, puzzle, several sort routines, the eight queens problem, problems from the Computer Family Architecture study (LU, BMT, TARGET, HEAPIFY, and AUTO), and Ada versions of the inner loops discussed in the paper by D. Knuth, "An Empirical Study of FORTRAN Programs" in Software: Practice and Experience, Volume 1, Number 2, 1971. These tests are in the Classical (Cl) group in the ACES test suite.
Typical usage of Ada in actual practice is represented by test problems based on several development projects. The examples below, which are all found in the Application group of test problems, are drawn from:
* A simulator for the E-3A. This is a set of problems drawn from a flight simulator. There are eight test problems from navigation, avionics, and communications. These are in