Callee-Saves Annals

Subroutines and Control Brainchild

Michael Fifty. Scott , in Programming Language Pragmatics (3rd Edition), 2009

eight.6.ii Transfer

To transfer from one coroutine to some other, the run-time system must change the program counter (PC), the stack, and the contents of the processor'southward registers. These changes are encapsulated in the transfer functioning: one coroutine calls transfer; a unlike one returns. Because the change happens within transfer, changing the PC from one coroutine to another but amounts to remembering the right return address: the old coroutine calls transfer from i location in the program; the new coroutine returns to a potentially dissimilar location. If transfer saves its return address in the stack, and then the PC volition modify automatically as a side result of changing stacks.

Example eight.57

Switching Coroutines

So how practise we change stacks? The usual arroyo is but to change the stack pointer register, and to avoid using the frame pointer inside of transfer itself. At the beginning of transfer we push the return accost and all of the other callee-saves registers onto the electric current stack. We then change the sp, popular the (new) return address (ra) and other registers off the new stack, and return:

transfer:

button all registers other than sp (including ra)

*current_coroutine := sp

current_coroutine := r1   – – argument passed to transfer

sp := *r1

pop all registers other than sp (including ra)

render

The information structure that represents a coroutine or thread is called a context block. In a unproblematic coroutine package, the context cake contains a single value: the coroutine'southward sp equally of its nigh contempo transfer. (A thread package generally places additional data in the context block, such as an indication of priority, or pointers to link the thread onto various scheduling queues. Some coroutine or thread packages cull to salvage registers in the context block, rather than at the superlative of the stack; either approach works fine.)

In Modula-2, the coroutine creation routine initializes the coroutine's stack to await like the frame of transfer, with a return address and register contents initialized to allow a "return" into the starting time of the coroutine's lawmaking. The creation routine sets the sp value in the context cake to point into this bogus frame, and returns a pointer to the context cake. To begin execution of the coroutine, some existing routine must transfer to information technology.

In Simula (and in the lawmaking in Instance 8.55), the coroutine creation routine begins to execute the new coroutine immediately, as if it were a subroutine. Afterwards the coroutine completes any application-specific initialization, it performs a detach operation. Detach sets up the coroutine stack to expect like the frame of transfer, with a return address that points to the following argument. Information technology then allows the creation routine to return to its own caller.

In all cases, transfer expects a pointer to a context cake as argument; past dereferencing the pointer information technology can detect the sp of the side by side coroutine to run. A global (static) variable, called current_coroutine in the code above, contains a pointer to the context block of the currently running coroutine. This arrow allows transfer to find the location in which it should save the old sp.

Read full chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B9780123745149000185

Code Shape

Keith D. Cooper , Linda Torczon , in Engineering a Compiler (Second Edition), 2012

7.ix.2 Saving and Restoring Registers

Nether whatsoever calling convention, one or both of the caller and the callee must preserve annals values. Often, linkage conventions utilise a combination of caller-saves and callee-saves registers. Equally both the cost of memory operations and the number of registers have risen, the cost of saving and restoring registers at phone call sites has increased, to the signal where it merits conscientious attention.

In choosing a strategy to save and restore registers, the compiler author must consider both efficiency and code size. Some processor features touch on this pick. Features that spill a portion of the annals set can reduce lawmaking size. Examples of such features include register windows on the sparc machines, the multiword load and store operations on the Power architectures, and the high-level telephone call operation on the vax. Each offers the compiler a compact way to salvage and restore some portion of the register set up.

While larger register sets tin can increase the number of registers that the code saves and restores, in full general, using these additional registers improves the speed of the resulting code. With fewer registers, the compiler would be forced to generate loads and stores throughout the lawmaking; with more registers, many of these spills occur only at a call site. (The larger annals set up should reduce the full number of spills in the code.) The concentration of saves and restores at call sites presents the compiler with opportunities to handle them in better ways than it might if they were spread across an entire procedure.

Using multi-register memory operations When saving and restoring adjacent registers, the compiler can utilize a multiregister memory operation. Many isas back up doubleword and quadword load and shop operations. Using these operations can reduce code size; it may also ameliorate execution speed. Generalized multiregister memory operations can have the aforementioned effect.

Using a library routine As the number of registers grows, the precall and postreturn sequences both grow. The compiler writer can replace the sequence of individual retention operations with a call to a compiler-supplied salvage or restore routine. Done across all calls, this strategy can produce a significant savings in code size. Since the salve and restore routines are known only to the compiler, they can use minimal call sequence to keep the runtime price low.

The save and restore routines can take an statement that specifies which registers must be preserved. It may be worthwhile to generate optimized versions for common cases, such as preserving all the caller-saves or callee-saves registers.

Combining responsibilities To further reduce overhead, the compiler might combine the work for caller-saves and callee-saves registers. In this scheme, the caller passes a value to the callee that specifies which registers it must save. The callee adds the registers it must salvage to the value and calls the appropriate compiler-provided salve routine. The epilogue passes the same value to the restore routine so that information technology can reload the needed registers. This approach limits the overhead to one telephone call to relieve registers and one to restore them. It separates responsibility (caller saves versus callee saves) from the toll to call the routine.

The compiler author must pay shut attention to the implications of the diverse options on code size and runtime speed. The code should utilise the fastest operations for saves and restores. This requires a shut expect at the costs of single-register and multiregister operations on the target architecture. Using library routines to perform saves and restores can salve space; conscientious implementation of those library routines may mitigate the added cost of invoking them.

Section Review

The lawmaking generated for process calls is split between the caller and the callee, and between the 4 pieces of the linkage sequence (prologue, epilogue, precall, and postreturn). The compiler coordinates the code in these multiple locations to implement the linkage convention, equally discussed in Chapter half dozen. Language rules and parameter bounden conventions dictate the order of evaluation and the style of evaluation for actual parameters. System-wide conventions make up one's mind responsibility for saving and restoring registers.

Compiler writers pay particular attention to the implementation of procedure calls because the opportunities are hard for full general optimization techniques (meet Chapters 8 and 10 Affiliate 8 Chapter 10 ) to discover. The many-to-one nature of the caller-callee relationship complicates analysis and transformation, as does the distributed nature of the cooperating code sequences. As important, pocket-sized deviations from the defined linkage convention can cause incompatibilities in code compiled with different compilers.

Review Questions

i.

When a procedure saves registers, either callee-saves registers in its prologue or caller-saves registers in a precall sequence, where should information technology relieve those registers? Are all of the registers saved for some call stored in the same ar?

ii.

In some situations, the compiler must create a storage location to concur the value of a phone call-past-reference parameter. What kinds of parameters may not have their own storage locations? What actions might be required in the precall and postcall sequences to handle these bodily parameters correctly?

Read total chapter

URL:

https://world wide web.sciencedirect.com/science/commodity/pii/B9780120884780000074

Architecture

Sarah L. Harris , David Coin Harris , in Digital Blueprint and Calculator Compages, 2016

Preserved Registers

Code Examples 6.22 and 6.23 presume that all of the used registers (R4, R8, and R9) must be saved and restored. If the calling office does non use those registers, the effort to save and restore them is wasted. To avert this waste, ARM divides registers into preserved and nonpreserved categories. The preserved registers include R4–R11. The nonpreserved registers are R0–R3 and R12. SP and LR (R13 and R14) must also exist preserved. A function must save and restore any of the preserved registers that it wishes to use, just it can alter the nonpreserved registers freely.

Code Example 6.24 shows a further improved version of diffofsums that saves only R4 on the stack. It also illustrates the preferred Push and Pop synonyms. The code reuses the nonpreserved argument registers R1 and R3 to agree the intermediate sums when those arguments are no longer necessary.

Code Example six.24

Reducing the Number of Preserved Registers

ARM Assembly Code

; R4 = issue

DIFFOFSUMS

PUSH {R4}          ; salve R4 on stack

ADD   R1, R0, R1             ; R1 = f + g

Add   R3, R2, R3             ; R3 = h + i

SUB   R4, R1, R3             ; upshot = (f + yard) − (h + i)

MOV   R0, R4           ; put return value in R0

POP {R4}           ; pop R4 off stack

MOV   PC, LR       ; return to caller

Call up that when ane function calls another, the former is the caller and the latter is the callee. The callee must save and restore whatever preserved registers that it wishes to use. The callee may modify any of the nonpreserved registers. Hence, if the caller is holding agile data in a nonpreserved register, the caller needs to save that nonpreserved register before making the function call and and then needs to restore information technology after. For these reasons, preserved registers are also called callee-relieve, and nonpreserved registers are called caller-save.

PUSH (and POP) relieve (and restore) registers on the stack in guild of register number from low to high, with the everyman numbered annals placed at the lowest memory address, regardless of the order listed in the assembly pedagogy. For example, PUSH {R8, R1, R3} will store R1 at the everyman retention address, so R3 and finally R8 at the next college memory addresses on the stack.

Table 6.half-dozen summarizes which registers are preserved. R4–R11 are generally used to hold local variables inside a function, and so they must be saved. LR must also be saved, so that the function knows where to return.

Table half-dozen.6. Preserved and nonpreserved registers

Preserved Nonpreserved
  Saved registers: R4–R11   Temporary register: R12
  Stack pointer: SP (R13)   Statement registers: R0–R3
  Return accost: LR (R14)   Current Program Condition Register
  Stack above the stack pointer   Stack beneath the stack pointer

R0–R3 and R12 are used to hold temporary results. These calculations typically consummate before a role call is made, so they are not preserved, and it is rare that the caller needs to save them.

The convention of which registers are preserved or not preserved is office of the Process Call Standard for the ARM Architecture, rather than of the compages itself. Alternate process telephone call standards be.

R0–R3 are often overwritten in the process of calling a function. Hence, they must be saved past the caller if the caller depends on whatever of its own arguments after a called function returns. R0 certainly should not be preserved, because the callee returns its result in this register. Recall that the Current Plan Status Register (CPSR) holds the condition flags. Information technology is not preserved across role calls.

The stack higher up the stack pointer is automatically preserved as long as the callee does not write to memory addresses to a higher place SP. In this way, it does not modify the stack frame of whatever other functions. The stack pointer itself is preserved, because the callee deallocates its stack frame before returning past calculation dorsum the same corporeality that information technology subtracted from SP at the beginning of the office.

The acute reader or an optimizing compiler may notice that the local variable result is immediately returned without beingness used for anything else. Hence, we tin eliminate the variable and simply store it in the return register R0, eliminating the demand to push and pop R4 and to move result from R4 to R0. Code Instance 6.25 shows this even further optimized diffofsums.

Code Example 6.25

Optimized diffofsums Function Telephone call

ARM Associates Lawmaking

DIFFOFSUMS

ADD   R1, R0, R1     ; R1 = f + chiliad

Add together   R3, R2, R3     ; R3 = h + i

SUB   R0, R1, R3     ; return (f + g) − (h + i)

MOV   PC, LR       ; return to caller

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128000564000066

Scalar Optimizations

Keith D. Cooper , Linda Torczon , in Engineering a Compiler (Second Edition), 2012

10.four Specialization

In most compilers, the shape of the ir programme is determined past the front end, earlier any detailed analysis of the code. Of necessity, this produces full general code that works in any context that the running program might encounter. With assay, nevertheless, the compiler can often learn enough to narrow the contexts in which the code must operate. This creates the opportunity for the compiler to specialize the sequence of operations in means that capitalize on its knowledge of the context in which the code will execute.

Major techniques that perform specialization appear in other sections of this book. Abiding propagation, described in Sections ix.3.6 and 10.eight, analyzes a procedure to discover values that always have the same value; it and so folds those values straight into the computation. Interprocedural constant propagation, introduced in Section 9.iv.ii, applies the same ideas at the whole-plan telescopic. Operator strength reduction, presented in Section 10.iv, replaces inductive sequences of expensive computations with equivalent sequences of faster operations. Peephole optimization, covered in Section 11.five, uses blueprint matching over short instruction sequences to find local improvement. Value numbering, explained in Section eight.4.1 and eight.v.one Section 8.4.ane Department 8.five.1 , systematically simplifies the ir form of the code by applying algebraic identities and local constant folding. Each of these techniques implements a class of specialization.

Optimizing compilers rely on these general techniques to improve code. In addition, most optimizing compilers contain specialization techniques that specifically target properties of the source languages or applications that the compiler writer expects to meet. The rest of this section presents 3 such techniques that target specific inefficiencies at procedure calls: tail-call optimization, leaf-call optimization, and parameter promotion.

x.4.1 Tail-Call Optimization

When the last action that a process takes is a call, we refer to that telephone call as a tail call. The compiler can specialize tail calls to their contexts in ways that eliminate much of the overhead from the procedure linkage. To understand how the opportunity for comeback arises, consider what happens when o calls p and p calls q. When q returns, it executes its epilogue sequence and jumps back to p'due south postreturn sequence. Execution continues in p until p returns, at which betoken p executes its epilogue sequence and jumps to o's postreturn sequence.

If the call from p to q is a tail call, then no useful computation occurs between the postreturn sequence and the epilogue sequence in p. Thus, any lawmaking that preserves and restores p's state, beyond what is needed for the render from p to o, is useless. A standard linkage, as described in Department 6.5, spends much of its effort to preserve state that is useless in the context of a tail phone call.

At the call from p to q, the minimal precall sequence must evaluate the actual parameters at the call from p to q and arrange the admission links or the display if necessary. It need not preserve any caller-saves registers, because they cannot be live. Information technology need not allocate a new ar, considering q can apply p's ar. It must get out intact the context created for a return to o, namely the return address and caller's arp that o passed to p and any callee-saves registers that p preserved by writing them into the ar. (That context will crusade the epilogue code for q to return control directly to o.) Finally, the precall sequence must jump to a tailored prologue sequence for q.

In this scheme, q must execute a custom prologue sequence to match the minimal precall sequence in p. It simply saves those parts of p's state that allow a render to o. The precall sequence does not preserve callee-saves registers, for two reasons. First, the values from p in those registers are no longer live. Second, the values that p left in the ar's register-save area are needed for the return to o. Thus, the prologue sequence in q should initialize local variables and values that q needs; it should and then branch into the code for q.

With these changes to the precall sequence in p and the prologue sequence in q, the tail phone call avoids preserving and restoring p's state and eliminates much of the overhead of the call. Of course, once the precall sequence in p has been tailored in this fashion, the postreturn and epilogue sequences are unreachable. Standard techniques such as Expressionless and Clean will not notice that fact, because they assume that the interprocedural jumps to their labels are executable. As the optimizer tailors the call, information technology can eliminate these expressionless sequences.

With a little care, the optimizer can accommodate for the operations in the tailored prologue for q to appear as the terminal operations in its more than general prologue. In this scheme, the tail call from p to q merely jumps to a betoken farther into the prologue sequence than would a normal telephone call from some other routine.

If the tail call is a self-recursive call—that is, p and q are the aforementioned procedure—then tail-call optimization can produce particularly efficient lawmaking. In a tail recursion, the unabridged precall sequence devolves to argument evaluation and a branch back to the top of the routine. An eventual return out of the recursion requires 1 branch, rather than 1 co-operative per recursive invocation. The resulting lawmaking rivals a traditional loop for efficiency.

10.4.ii Leafage-Telephone call Optimization

Some of the overhead involved in a procedure phone call arises from the demand to prepare for calls that the callee might make. A process that makes no calls, called a leaf procedure, creates opportunities for specialization. The compiler tin easily recognize the opportunity; the procedure calls no other procedures.

The other reason to store the return accost is to allow a debugger or a performance monitor to unwind the call stack. When such tools are in use, the compiler should leave the salvage functioning intact.

During translation of a leaf process, the compiler can avert inserting operations whose sole purpose is to set up for subsequent calls. For instance, the process prologue code may save the render address from a annals into a slot in the ar. That action is unnecessary unless the procedure itself makes another phone call. If the annals that holds the return address is needed for some other purpose, the register allocator can spill the value. Similarly, if the implementation uses a display to provide addressability for nonlocal variables, equally described in Section 6.4.three, it tin avoid the display update in the prologue sequence.

The annals allocator should endeavour to use caller-saves registers before callee-saves registers in a leaf procedure. To the extent that it tin go out callee-saves registers untouched, it can avoid the save and restore code for them in the prologue and epilogue. In small leaf procedures, the compiler may be able to avoid all use of callee-saves registers. If the compiler has access to both the caller and the callee, it tin can do ameliorate; for leaf procedures that need fewer registers than the caller-salve set includes, it can avert some of the register saves and restores in the caller as well.

In add-on, the compiler can avoid the runtime overhead of activation-tape allocation for leaf procedures. In an implementation that heap allocates ars, that cost can be pregnant. In an application with a unmarried thread of control, the compiler can classify statically the ar of any leaf procedure. A more than aggressive compiler might allocate one static ar that is large enough to work for any leaf procedure and have all the leafage procedures share that ar.

If the compiler has access to both the leaf procedure and its callers, information technology can allocate space for the leaf procedure'south ar in each of its callers' ars. This scheme amortizes the cost of ar allocation over at least ii calls—the invocation of the caller and the call to the foliage procedure. If the caller invokes the leaf procedure multiple times, the savings are multiplied.

x.4.3 Parameter Promotion

Ambiguous retentivity references prevent the compiler from keeping values in registers. Sometimes, the compiler can prove that an ambiguous value has but ane corresponding memory location through detailed assay of pointer values or array subscript values, or special case assay. In these cases, it can rewrite the code to motility that value into a scalar local variable, where the register allocator can keep it in a register. This kind of transformation is often called promotion. The analysis to promote array references or arrow-based references is beyond the telescopic of this book. However, a simpler case can illustrate these transformations equally well.

Promotion

A category of transformations that move an ambiguous value into a local scalar name to expose it to annals allocation

Consider the code generated for an ambiguous call-by-reference parameter. Such parameters can arise in many ways. The lawmaking might pass the aforementioned actual parameter in two distinct parameter slots, or it might laissez passer a global variable as an bodily parameter. Unless the compiler performs interprocedural analysis to rule out those possibilities, it must care for all reference parameters as potentially ambiguous. Thus, every use of the parameter requires a load and every definition requires a shop.

If the compiler can prove that the actual parameter must be unambiguous in the callee, it can promote the parameter's value into a local scalar value, which allows the callee to keep information technology in a register. If the bodily parameter is non modified past the callee, the promoted parameter can be passed past value. If the callee modifies the actual parameter and the result is live in the caller, then the compiler must use value-result semantics to laissez passer the promoted parameter (see Section 6.4.1).

To utilize this transformation to a process p, the optimizer must identify all of the call sites that can invoke p. Information technology can either testify that the transformation applies at all of those phone call sites or it can clone p to create a copy that handles the promoted values (see Department 10.6.2). Parameter promotion is almost attractive in a linguistic communication that uses call-past-reference bounden.

Department Review

Specialization includes many constructive techniques to tailor general-purpose computations to their detailed contexts. Other chapters and sections present powerful global and regional specialization techniques, such as abiding propagation, peephole optimization, and operator force reduction.

This section focused on optimizations that the compiler can apply to the code entailed in a procedure call. Tail-call optimization is a valuable tool that converts tail recursion to a class that rivals conventional iteration for efficiency; information technology applies to nonrecursive tail calls every bit well. Leaf procedures offering special opportunities for comeback because the callee can omit major portions of the standard linkage sequence. Parameter promotion is 1 example of a grade of of import transformations that remove inefficiencies related to ambiguous references.

Review Questions

1.

Many compilers include a simple form of forcefulness reduction, in which individual operations that take one abiding-valued operand are replaced past more efficient, less full general operations. The classic example is replacing an integer multiply of a positive number past a series of shifts and adds. How might y'all fold that transformation into local value numbering?

2.

Inline commutation might be an alternative to the procedure-call optimizations in this section. How might you lot utilize inline substitution in each case? How might the compiler choose the more profitable alternative?

Read full affiliate

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780120884780000104

The Procedure Abstraction

Keith D. Cooper , Linda Torczon , in Engineering a Compiler (2d Edition), 2012

6.5 Standardized Linkages

The procedure linkage is a contract between the compiler, the operating system, and the target machine that clearly divides responsibleness for naming, allocation of resource, addressability, and protection. The procedure linkage ensures interoperability of procedures betwixt the user's code, as translated past the compiler, and code from other sources, including system libraries, awarding libraries, and lawmaking written in other programming languages. Typically, all of the compilers for a given combination of target car and operating system use the same linkage, to the extent possible.

The linkage convention isolates each procedure from the unlike environments plant at telephone call sites that invoke it. Assume that procedure p has an integer parameter x. Unlike calls to p might bind x to a local variable stored in the caller'southward stack frame, to a global variable, to an element of some static array, and to the result of evaluating an integer expression such as y+2. Because the linkage convention specifies how to evaluate the actual parameter and store its value, as well as how to access 10 in the callee, the compiler can generate code for the callee that ignores the differences betwixt the runtime environments at the different calls sites. As long every bit all the procedures obey the linkage convention, the details will mesh to create the seamless transfer of values promised by the source-linguistic communication specification.

The linkage convention is, of necessity, machine dependent. For example, it depends implicitly on information such every bit the number of registers available on the target machine and the mechanisms for executing a call and a return.

Figure half-dozen.x shows how the pieces of a standard procedure linkage fit together. Each procedure has a prologue sequence and an epilogue sequence. Each phone call site includes both a precall sequence and a postreturn sequence.

Figure half-dozen.10. A Standard Procedure Linkage.

Precall Sequence The precall sequence begins the process of amalgam the callee'due south surround. It evaluates the bodily parameters, determines the return address, and, if necessary, the address of infinite reserved to hold a return value. If a telephone call-by-reference parameter is currently allocated to a register, the precall sequence needs to store it into the caller'due south ar so that it can laissez passer that location'southward address to the callee.

Many of the values shown in the diagrams of the ar can be passed to the callee in registers. The return address, an address for the render value, and the caller's arp are obvious candidates. The first yard actual parameters can be passed in registers as well—a typical value for m might be 4. If the phone call has more than than k parameters, the remaining bodily parameters must be stored in either the callee'due south ar or the caller's ar.

Postreturn Sequence The postreturn sequence undoes the actions of the precall sequence. It must restore any telephone call-by-reference and call-by-value-result parameters that need to be returned to registers. It restores any caller-saved registers from the register relieve surface area. Information technology may demand to deallocate all or office of the callee'due south ar.

Prologue Sequence The prologue for a procedure completes the task of creating the callee'southward runtime environment. It may create infinite in the callee'southward ar to shop some of the values passed by the caller in registers. It must create space for local variables and initialize them, as necessary. If the callee references a procedure-specific static data expanse, it may need to load the characterization for that data area into a register.

Epilogue Sequence The epilogue for a procedure begins the process of dismantling the callee'due south surround and reconstructing the caller's environment. It may participate in deallocating the callee's ar. If the process returns a value, the epilogue may be responsible for storing the value into the address specified by the caller. (Alternatively, the lawmaking generated for a return statement may perform this job.) Finally, it restores the caller's arp and jumps to the render accost.

This framework provides general guidance for building a linkage convention. Many of the tasks can be shifted betwixt caller and callee. In general, moving work into the prologue and epilogue code produces more compact lawmaking. The precall and postreturn sequences are generated for each telephone call, while the prologue and epilogue occur one time per procedure. If procedures are called, on boilerplate, more than one time, and so there are fewer prologue and epilogue sequences than precall and postreturn sequences.

More about Time

In a typical system, the linkage convention is negotiated between the compiler implementors and the operating-arrangement implementors at an early stage of the arrangement's development. Thus, issues such as the distinction betwixt caller-saves and callee-saves registers are decided at design time. When the compiler runs, it must emit procedure prologue and epilogue sequences for each procedure, along with precall and postreturn sequences for each call site. This code executes at runtime. Thus, the compiler cannot know the return accost that it should shop into a callee'south ar. (Neither can it know, in general, the accost of that ar.) It can, withal, include a mechanism that volition generate the render address at link time (using a relocatable assembly language label) or at runtime (using some offset from the program counter) and store it into the appropriate location in the callee'south ar.

Similarly, in a system that uses a display to provide addressability for local variables of other procedures, the compiler cannot know the runtime addresses of the display or the ar. All the same, information technology emits lawmaking to maintain the brandish. The mechanism that achieves this requires two pieces of information: the lexical nesting level of the current procedure and the address of the global brandish. The old is known at compile time; the latter can be determined at link time past using a relocatable assembly language label. Thus, the prologue can only load the current display entry for the procedure's level (using a loadAO from the display accost) and store information technology into the ar (using a storeAO relative to the arp). Finally, it stores the address of the new ar into the display slot for the process's lexical level.

Saving Registers

At some point in the call sequence, whatever register values that the caller expects to survive across the call must be saved into memory. Either the caller or the callee can perform the actual save; at that place is an advantage to either pick. If the caller saves registers, it can avoid saving values that it knows are not useful across the call; that knowledge might allow information technology to preserve fewer values. Similarly, if the callee saves registers, information technology tin avoid saving values of registers that information technology does not apply; again, that knowledge might outcome in fewer saved values.

Caller-Saves Registers

The registers designated for the caller to salve are caller-saves registers.

Callee-Saves Registers

The registers designated for the callee to save are callee-saves registers.

In general, the compiler can apply its knowledge of the process beingness compiled to optimize annals relieve behavior. For any specific division of labor between caller and callee, we tin construct programs for which it works well and programs for which it does not. Most modernistic systems take a centre basis and designate a portion of the annals set for caller-saves handling and a portion for callee-saves treatment. In practice, this seems to work well. It encourages the compiler to put long-lived values in callee-saves registers, where they will be stored merely if the callee actually needs the register. It encourages the compiler to put short-lived values in caller-saves registers, where it may avoid saving them at a call.

Allocating the Activation Record

In the most general case, both the caller and the callee need access to the callee's ar. Unfortunately, the caller cannot know, in full general, how large the callee'south ar must be (unless the compiler and linker tin can contrive to have the linker paste the appropriate values into each phone call site).

With stack-allocated ars, a centre ground is possible. Since allotment consists of incrementing the stack-elevation pointer, the caller can brainstorm the creation of the callee's ar past bumping the stack superlative and storing values into the appropriate places. When control passes to the callee, it tin can extend the partially built ar by incrementing the stack summit to create space for local data. The postreturn sequence can then reset the stack-meridian pointer, performing the entire deallocation in a single stride.

With heap-allocated ars, it may not be possible to extend the callee's ar incrementally. In this situation, the compiler writer has two choices.

ane.

The compiler can pass the values that it must store in the callee's ar in registers; the prologue sequence can then allocate an appropriately sized ar and shop the passed values in it. In this scheme, the compiler author reduces the number of values that the caller passes to the callee by arranging to store the parameter values in the caller's ar. Access to those parameters uses the copy of the caller's arp that is stored in the callee'south ar.

two.

The compiler author can split the ar into multiple distinct pieces, one to agree the parameter and control data generated past the caller and the others to hold space needed by the callee merely unknown to the caller. The caller cannot, in full general, know how large to make the local data surface area. The compiler can shop this number for each callee using mangled labels; the caller tin then load the value and use it. Alternatively, the callee tin can allocate its own local information area and go on its base address in a annals or in a slot in the ar created by the caller.

Heap-allocated ars add together to the overhead cost of a procedure call. Care in the implementation of the calling sequence and the allocator can reduce those costs.

Managing Displays and Access Links

Either mechanism for managing nonlocal admission requires some work in the calling sequence. Using a display, the prologue sequence updates the display tape for its ain level and the epilogue sequence restores it. If the procedure never calls a more than deeply nested process, information technology tin skip this step. Using admission links, the precall sequence must locate the appropriate starting time admission link for the callee. The amount of piece of work varies with the divergence in lexical level between caller and callee. As long as the callee is known at compile time, either scheme is reasonably efficient. If the callee is unknown (if it is, for example, a function-valued parameter), the compiler may need to emit special-instance code to perform the advisable steps.

Department Review

The procedure linkage ties together procedures. The linkage convention is a social contract between the compiler, the operating organization, and the underlying hardware. Information technology governs the transfer of control between procedures, the preservation of the caller's state and the cosmos of the callee's state, and the rules for passing values between them.

Standard procedure linkages let united states to get together executable programs from procedures that accept dissimilar authors, that are translated at unlike times, and that are compiled with unlike compilers. Process linkages allow each procedure to operate safely and correctly. The same conventions let application lawmaking to invoke arrangement and library calls. While the details of the linkage convention vary from system to arrangement, the basic concepts are similar across virtually combinations of target car, operating system, and compiler.

Review Questions

i.

What role does the linkage convention play in the construction of large programs? Of interlanguage programs? What facts would the compiler need to know in social club to generate lawmaking for an interlanguage phone call?

two.

If the compiler knows, at a procedure call, that the callee does not, itself, contain any procedure calls, what steps might it omit from the calling sequence? Are there any fields in the ar that the callee would never need?

Read total affiliate

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780120884780000062