Implementing Call-by-Reference and Call-by-Name in Lua

Implementing Call-by-Reference and Call-by-Name in Lua

This article was first published at medium.com on Oct. 10, 2019

Lua (see https://www.lua.org/) is an amazingly flexible scripting language. Its original bytecode interpreter is not only extremely lean but was also one of the fastest scripting language implementations for many years. And then came LuaJIT.

LuaJIT (see https://luajit.org/) is an alternative implementation of Lua with an integrated, state-of-the-art just-in-time (JIT) compiler. Its bytecode interpreter alone is four times faster than the original PUC Lua implementation, and in JIT mode the speed-up compared to the original implementation is an incredible factor of 30 on average, 18 in median and 15 in geometric mean. This makes LuaJIT even ~1.3 times faster than the well-known JavaScript V8 engine when comparing the geometric mean values from this source (also confirmed by other sources, e.g. here). And yet LuaJIT is much smaller and less resource-hungry than the V8 engine.

With all these impressive advantages I asked myself why LuaJIT is not more commonly used as a backend for languages other than Lua, especially for strongly typed languages in the tradition of Algol. MoonScript and Haxe are the only languages I’m aware of which (optionally) compile to Lua source code, but the former is just another dynamic scripting language and the latter indeed supports static typing but is not strongly typed. But what about languages like Pascal, Modula or Oberon — where e.g. the syntax of Lua has been strongly influenced by Modula? Could programs written in these languages be translated to Lua source code or LuaJIT bytecode without any modifications?

I have decided to try my luck with Oberon, not because I would consider Oberon to be the best of all programming languages, but because it is sufficiently simple and I had already written an Oberon transpiler in C++.

As it turned out, most of Oberon’s constructs can be translated directly into corresponding Lua constructs. Especially easy is the implementation of nested procedures and the access to local variables and formal parameters of the enclosing procedures, because Lua can do this already out-of-the-box. Much more tricky is the implementation of VAR parameters (i.e. call-by-reference), on which I will concentrate in the rest of this article.

What is call-by-reference?

Let’s first recap what call-by-value means, which is the parameter-passing mechanism implemented by nearly all scripting languages. Here is a Lua example:

-- LuaDemo1
local x = 1
local function increment( y )
  y = y + 1
  print(y)
end
increment(x)
print(x)
-- prints the following:
-- 2
-- 1

As you can see the parameter y receives a copy of the value of variable x when increment() is called. The change to y has no effect on x.

Now let’s see how call-by-reference is implemented in Oberon:

MODULE VarDemo1;
  IMPORT Out;
  VAR x: INTEGER;
  PROCEDURE increment( VAR y: INTEGER );
  BEGIN
    y := y + 1;
    Out.Int(y,0); Out.Ln;
  END increment;
BEGIN
  x := 1;
  increment(x);
  Out.Int(x,0); Out.Ln;
END VarDemo1.
(* prints the following:
   2
   2 *)

As you can see here the procedure increment() has modified the value of the variable x, which was passed as a reference to the procedure call. If you remove the VAR keyword from the declaration, the code behaves exactly like the Lua example. This keyword, therefore, controls whether call-by-reference or call-by-value is applied. There is no such syntax in Lua.

Only very few scripting languages support call-by-reference with scalar variables at all. I’m only aware of PHP and Perl which allow a scalar variable to be passed by reference. Lua is definitely not one of those; neither is JavaScript nor Python.

Before we turn to possible call-by-reference implementations in Lua, let’s briefly have a look at call-by-name, which turned out to be easier to implement in Lua than call-by-reference. My currently preferred implementation of call-by-reference in Lua is a specialization of the call-by-name implementation.

What is call-by-name?

The call-by-name mechanism is not very much known and in use today. It was introduced with Algol 60 and is also available e.g. in Simula 67. But it is interesting here because it can easily be done in Lua which is yet another indication of how versatile and powerful this language is.

Here is the description of call-by-name from the Revised Report on the Algorithmic Language Algol 60:

Any formal parameter not quoted in the value list is replaced, throughout the procedure body, by the corresponding actual parameter, after enclosing this latter in parentheses wherever syntactically possible. Possible conflicts between identifiers inserted through this process and other identifiers already present within the procedure body will be avoided by suitable systematic changes of the formal or local identifiers involved.

This means if we just pass a local scalar variable by name to a procedure the effect is essentially the same as with call-by-reference. But now have a look at this example in Oberon:

MODULE VarDemo2;
  IMPORT Out;
  VAR x: ARRAY 2 OF INTEGER; i: INTEGER;
  PROCEDURE increment( VAR y: INTEGER );
  BEGIN
    y := y + 1;
  END increment;
  PROCEDURE index(): INTEGER;
  BEGIN
    i := i + 1;
    RETURN i
  END index;
BEGIN
  i := -1;
  x[0] := 1; x[1] := 0;
  increment(x[index()]);
  Out.Int(x[0],0); Out.Ln;
  Out.Int(x[1],0); Out.Ln;
END VarDemo2.
(* prints the following if increment() is call-by-reference:
   2
   0
   prints the following if increment() is call-by-name
   (assuming right-hand of assignment is evaluated first):
   1
   2  *)

If procedure increment() obeys call-by-reference then the expression x[index()] is only evaluated once when the actual parameter is handed over to the formal parameter. But if increment() were call-by-name (which is not supported by Oberon and just a mental game here), then y would be replaced all over in the procedure body by the expression x[index()] which therefore was called twice. In consequence, the left-hand side of the assignment would designate a different array element than the right-hand side. Now how can we achieve the same effect in Lua?

Implementing call-by-name in Lua

How call-by-name can be implemented in an Algol 60 compiler is e.g. described in the book Algol-60 Implementation from 1964. An illustrative example in Pascal can be found in the lecture notes by Robert D. Cameron from 2002. Cameron demonstrates how to use a “thunk” procedure to evaluate the expression and then return the address of the designated variable. But in Lua, there is no way to get the address of a local variable. So we need another solution.

Lua has two features that make an elegant solution possible. On the one hand, Lua treats functions as first-class values; functions can be anonymously declared in place. On the other hand, Lua functions are closures and they have dynamic lexical scoping which means that they can access variables of its enclosing functions even if called out of scope. We can use Lua functions to implement a thunk that doesn’t need the address of a local variable, but which can directly be used to retrieve or set the value of the local variable.

Here again is the Lua example from above, extended by a thunk as described:

-- LuaDemo2
local x = 1
local function increment( y )
  y( true, y() + 1 ) -- y is now a function, no longer a scalar
  print(y())
end
increment( 
  function(set,val) -- thunk function
    if set then x = val else return x end 
  end )
print(x)
-- prints the following:
-- 2
-- 2

As you can see it behaves as required and prints the same values as the Oberon VarDemo1 example from above. We had to modify the body of increment() because y is now a function; note that the original assignment is replaced by two calls of the thunk function, one to retrieve the value, and the other to set the value. The retrieving call needs no actual parameters; set and val are both nil in this case which activates the else part of the if condition of the thunk. The other call in contrast has two actual parameters: true and the value to be assigned to x. If we never had to assign nil we could even get away with only the val parameter and check this one for nil in the if condition. But two parameters give us more flexibility with very little overhead. The reference to x in the body of the thunk function continues to reference the local variable x in the scope where the thunk was defined.

Now see how we have to modify the Lua example to achieve the same effect as in the Oberon VarDemo2:

-- LuaDemo3
local x = { 1, 0 }
local i = 0
local function increment( y )
  y( true, y() + 1 )
end 
local function index() 
  i = i + 1
  return i
end
increment( 
  function(set,val) -- thunk function
    if set then x[index()] = val else return x[index()] end 
  end )
print(x[1])
print(x[2])
-- prints the following:
-- 1
-- 2

As you can see this is the result we expect when call-by-name is applied in the Oberon VarDemo2 example. The code is very similar to LuaDemo2 above and the same explanation why it works also applies here. The primary difference is the expression used in the thunk function. In LuaDemo2 there was only a reference to the local variable x. Here we have the same expression x[index()] we handed over to increment() in VarDemo2. Note that i is initialized to 0 here because Lua — in contrast to Oberon — uses one-based array indices.

In this section, I have demonstrated how to implement call-by-name in Lua. Now let’s see what additions are needed to implement call-by-reference.

Implementing call-by-reference in Lua

To implement call-by-reference in Lua we have two possibilities: we can either use multiple return values or an extension of the thunk concept described above.

In Lua, a function can return multiple values at once. The effect of call-by-reference can be simulated in principle by handing over the parameter values changed in the function body as function return values. Here is an example:

-- LuaDemo4
local x = 1
local function increment( y )
  y = y + 1
  return y
end
x = increment(x)
print(x)
-- prints the following:
-- 2

But there are a few complications. On the one hand, the function could already have return values which then have to be coordinated in the function body and on the caller side. On the other hand, actual parameters of the function could be expressions like the one in LuaDemo3; so it is not possible to simply put the same expression on the left side of the assignment because it would then be evaluated twice. To avoid that, we have to use temporary variables and split up the expression.

The thunk concept has to take care of how many times selector expressions are evaluated as well, but in contrast to the return value-based solution no local temporary variables are needed and the code becomes less complicated (however, at the expense of performance as will be discussed later).

We have already seen in LuaDemo2 that a simple thunk function is a suitable solution for implementing call-by-reference as long as scalar variables are used as actual arguments. We therefore still have to find a solution for the case where the variable is an array element or a record field. In Lua, we need at least a table reference and an index value (array index or field name) to access a specific array element or record field. So we have to add these somehow to the closure which is used as a thunk. Lua does not allow that (at least not through the front door). The solution is to use a table instead of a function but make the table “callable” using a metatable and the __call event (see section 2.8 in the Lua reference manual for more information about metatables). Here is a helper module for such “structured thunks”:

-- StructuredThunks.lua
local module = {}
local function runThunk( table, set, val )
  if set then
    table.table[table.index] = val
  else
    return table.table[table.index]
  end
end
function module.create( table, index )
  local t = { table = table, index = index, __call = runThunk }
  setmetatable( t, t )
  return t
end
return module

And here is an example of how the structured thunks are applied:

-- LuaDemo5
local st = require "StructuredThunks"
local x = { 1, 0 }
local i = 0
local function increment( y )
  y( true, y() + 1 )
end 
local function index() 
  i = i + 1
  return i
end
increment( 
  st.create( x, index() ) -- split expression x[index()]
)
print(x[1])
print(x[2])
-- prints the following:
-- 2
-- 0

The printed result corresponds to the one in VarDemo2 for call-by-reference. With this, we have found a functionally equivalent representation of the Oberon program in Lua. Note that the implementation of the function increment() in LuaDemo5 is exactly the same as in LuaDemo2. As a consequence, we only have to take care of whether the actual arguments are scalar or structured at the call site of the function, not in the function body.

Now back to the question about performance. I have implemented a first version of an Oberon to Lua transpiler using the mixed simple and structured thunk concept described here, and I did some measurements using the Hennessy benchmark. When compared to Oberon code natively compiled using OBNC the generated Lua code runs between factor 1 (FFT, Buble, Towers) and 30 (Perm, Queens) slower on my machine using LuaJIT 2.0. The more VAR parameters or recursion the procedure has the slower the generated code runs. This is likely because each read or write access to a VAR parameter requires an additional function call and two hash table lookups. And my current implementation is far from optimal for sure. But LuaJIT claims to do function inlining among many other optimizations, and the thunk functions look like good candidates to be inlined. So it’s too early to draw final conclusions in that respect.

I would expect the multiple return value-based solution to run faster from the outset because local variables in Lua are directly addressable by numeric index (in contrast to the hash lookups for tables) and no additional function calls are required. I will consider this solution in the Oberon to LuaJIT bytecode compiler which I’m going to implement next, now that the general feasibility of the approach has been demonstrated (some of the required tools are already available). Splitting up expressions and using temporary registers is no issue with a bytecode compiler, and there are even more optimization possibilities. But for the present Oberon to Lua source code transpiler I prefer the thunk-based approach because the resulting code is closer to the original and easier to comprehend and check. But depending on the intended use of the transpiler, the current achievable performance might already be sufficient, as discussed in the next section.

For the sake of completeness, I would like to mention that I also tried an approach directly on the bytecode level. The PUC Lua bytecode (which is not the same as the LuaJIT bytecode) requires explicitly declaring which local variable is to be represented by an upvalue. For this purpose, the CLOSURE opcode is followed by a pseudo instruction for each upvalue (see here for more information). This would allow us to reuse the upvalue mechanism for VAR parameters. Remember that I described above that we need a table since we cannot add variables to the closure ourselves. On the bytecode level, this is obviously possible. We thus could reuse the same function prototype and add CLOSURE and the corresponding local variable reference for each unique set of actual parameters, without needing any thunks. But unfortunately, this would only work in the PUC Lua implementation. The FNEW opcode of the LuaJIT implementation directly uses information in the function prototype header instead (see here for more information); there is no way to change the association between local variables and upvalues when instantiating the closure, so we would need a full copy of the function prototype with a dedicated upvalue specification for each unique set of actual parameters. That’s why I did not proceed with this solution.

Conclusions and Outlook

I have demonstrated how call-by-reference can be implemented in Lua and it is thus possible to create a functionally equivalent implementation of a program written in a strongly typed language like Oberon in Lua. My implementation of an Oberon to Lua source code transpiler works, but there is still room for improvement, especially with regard to performance.

The demonstrated concepts do not only work with Lua but likely with any programming language that supports closures.

Reusing LuaJIT as a backend for others, especially statically and strongly typed programming languages seems feasible. Still, further advantages are to be expected, if one generates directly LuaJIT bytecode and thus is not limited by the syntax and semantics of the source code level.

Since LuaJIT is much leaner than other JIT implementations (e.g. V8, LLVM, JVM) it might be worthwhile to also build LuaJIT-based backends for the dominating languages on these platforms (like e.g. JavaScript or Java).

I personally find LuaJIT attractive as a backend because it is available on all platforms relevant to me (including embedded), includes a garbage collector, and it is relatively easy to implement dynamically loadable modules and source-level debuggers. Since I am interested in old programming languages, I also consider to build a frontend for Algol 60 and Simula 67 using LuaJIT as a backend. But LuaJIT seems also especially interesting to build a debugger and simulation engine for hardware description languages such as Verilog or Lola.

The possibilities described in this article should especially be of particular interest to people who appreciate the excellent features of LuaJIT, but — for whatever reason — would rather work with a programming language other than Lua.

Update Oct. 13

For the interested readers, I would like to report here about the progress and findings since the publication of this article on Oct. 10.

When I wrote the article the code generated by my Oberon to Lua transpiler was about twenty times slower on LuaJIT on average than the same program when natively compiled using OBNC. In the meantime, I was able to carry out further analyses with the tools provided by LuaJIT.

As it turned out, LuaJIT’s just-in-time compiler would indeed have found the optimal traces and replaced the function calls of the thunks with linear code. But since not all opcodes — especially FNEW — are fully implemented in the tracing compiler yet, these traces were discarded. I could adapt my generator so that the thunk functions are no longer defined in-place but assigned to local variables. This avoids calls of FNEW in the trace. In addition, functions that call themselves recursively no longer generate redundant thunks.

These measures resulted in a considerable speedup so that the generated code now runs only three times slower on geometric mean than the code compiled with OBNC. I still see further potential for optimization, even if this performance is already in the same range as the expectations described at the beginning of this article. The updated version of the generator is on GitHub.

Update Dec. 17

Meanwhile, I was able to implement a new semantic analyzer and abstract syntax tree (AST) generator with a visitor-based processing pipeline. In contrast to my first approach, it is now possible to do several passes and to rewrite the AST. The new translator first replaces all procedure and function calls having variable parameters with equivalent call-by-value and multiple return calls. The updated version of the translator is on GitHub.

The resulting speedup is considerable. The generated code now runs only two times slower on median, average and geometric mean than the code compiled with OBNC. But implementing the translator was considerably more work. So there is — not surprisingly — a trade-off between performance and effort.

I will do further experiments and presumably reimplement a more optimized thunk-based version using the new pipeline (in contrast to the previous syntax-directed translator), and also implement a direct-to-bytecode compiler which enables yet another optimization potential.

Update January 2020

The Oberon to LuaJIT bytecode compiler is reality. As far as I know, this is the first (working) compiler of its kind. The compiler doesn’t do optimizations yet, but even though I got another significant speedup. The generated code now runs only 1.4 times slower on median and geometric mean than the code compiled with OBNC. But this is not yet the maximum speed, since the current implementation doesn’t use LuaJIT FFI and tail calls yet.

References