summaryrefslogtreecommitdiffstats
path: root/js/src/doc/JITOptimizations/Strategies.md
blob: d4feef3103bc9097e179d9660df02abb8f4da918 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
# JIT Optimization Strategies

SpiderMonkey's optimizing JIT, IonMonkey, uses a number of different
optimization strategies to speed up various operations. The most commonplace
operations that are relevant for fast program execution are property accesses
and function calls.

Optimization information is currently collected for the following operations:

- [BinaryArith](#binaryarith) (`+-/*%`)
- [GetProperty](#getprop) (`obj.prop`)
- [SetProperty](#setprop) (`obj.prop = val`)
- [GetElement](#getelem) (`obj[elemName]`)
- [SetElement](#setelem) (`obj[elemName] = val`)
- [Call](#call) (`func(...)`)

At each operation site, IonMonkey tries a battery of <i>strategies</i>, from
the most optimized but most restrictive to the least optimized but least
restrictive. For each strategy attempted, its <i>outcome</i> is tracked. An
outcome is either success or a reason why the strategy failed to apply.

This page documents the various optimization strategies and their outcomes. It
provides information on what they attempt to do, what general level of
speed-up they provide, what kind of program characteristics can prevent them
from being used, and common ways to enable the engine to utilize that
optimization.

## <a name="binaryarith"></a>BinaryArith

### BinaryArith_Concat

Attempts to optimize an addition to string concatenation. The types of the operands are checked to contain a hint of being a concatenation. Both have to be string or one has to be a string and the other a type that easily can get converted to string (like numbers).

### BinaryArith_SpecializedTypes

Attempts to do a numeric arithmetic based on operand types. One of the inputs need to be a numeric type and the other a simple arithmetic type.

### BinaryArith_SpecializedOnBaselineTypes

Just like BinrayArith_SpecializedTypes tries to do a numeric arithmetic, but based on the observed types in the Baseline Compiler. If it succeeds it will roughly give same optimization as BinaryArith_SpecializedTypes, but the deduction is more fragile. In the best case one should try to get this optimization based on the input types.

### BinaryArith_SharedCache

Attempts to optimize a binary arithmetic using inline cache.

### BinaryArith_Call

Last resort which cannot get specialized at all and takes the slow path to do the arithmetic.


## <a name="getprop"></a>GetProperty

### GetProp_ArgumentsLength

Attempts to optimize an `arguments.length` property access. This optimization
only works if the arguments object is used in well-understood ways within the
function. The function containing the arguments.length is allowed to use the
arguments object in the following ways without disabling this optimization:

- Access `arguments.length`
- Access `arguments.callee`
- Access individual args using `arguments[i]`
- Save arguments into variables, as long as those variables cannot be accessed
  by any nested function, and as long as there exists no eval nywhere within
  the function or nested function definitions.
- Call a function using `f.apply(obj, arguments)`

If the function contains any use of the arguments object that falls out of the
cases defined above, this optimization will be suppressed. In particular,
`arguments` cannot be returned from the function, or passed as an argument into
calls (except for the `apply` case above).

### GetProp_ArgumentsCallee

Attempts to optimize an `arguments.callee` property access. This optimization
only works if the arguments object is used in well-understood ways within the
function. The function containing the `arguments.callee` is allowed to use the
arguments object in the following ways without disabling this optimization:

- Access arguments.length
- Access arguments.callee
- Access individual args using arguments[i]
- Save arguments into variables, as long as those variables cannot be accessed
  by any nested function, and as long as there exists no eval nywhere within
  the function or nested function definitions.
- Call a function using `f.apply(obj, arguments)`

If the function contains any use of the `arguments` object that falls out of
the cases defined above, this optimization will be suppressed. In particular,
arguments cannot be returned from the function, or passed as an argument into
calls (except for the `apply` example listed above).

### GetProp_InferredConstant

Attempts to optimize an access to a property that seems to be a constant. It
applies to property accesses on objects which are global-like in that there is
only one instance of them per program. This includes global objects, object
literals defined at the top-level of a script, and top-level function objects.

This optimization makes the assumption that a property that has not changed
after it was first assigned, is likely a constant property.  It then directly
inlines the value of the property into hot code that accesses it. For
example, in the following code:

    var Constants = {};
    Constants.N = 100;

    function testArray(array) {
      for (var i = 0; i < array.length; i++) {
        if (array[i] > Constants.N)
          return true;
      }
      return false;
    }

Will have the loop compiled into the following when `testArray` gets hot.

    for (var i = 0; i < array.length; i++) {
      if (array[i] > 100)
        return true;
    }

When this optimization is successful, property access is eliminated entirely
and replaced with an inline constant.

### GetProp_Constant

Attempts to optimize reading a property that contains a uniquely-typed (or
"singleton") object.  With uniquely-typed objects, it is guaranteed that
no other object has that same type.  Unique (or "singleton") types are
assigned to certain kinds of objects, like global objects, top-level
functions, and object literals declared at the top-level of a script.  If
a property has always contained the same uniquely-typed object, then the
engine can use the unique type to map back to a specific object, and
eliminate the property access, replacing it with a constant reference to
the object.

When this optimization is successful, property access is eliminated entirely
and replaced with an inline constant. The different success and failure
conditions are documented below:

### GetProp_StaticName

Attempts to optimize a property access on `window` which refers to
a property on the global object.

### GetProp_TypedObject

Optimizes accesses to properties on TypedObjects.

### GetProp_DefiniteSlot

Optimizes access to a well-known regular property on an object.  For this
optimization to succeed, the property needs to be well-defined on the object.
For objects constructed by constructor functions, this means that the property
needs to be defined in the constructor, before any complex logic occurs within
the constructor.

This is the best case for a regular "field" type property that is not
turned into a constant.  It compiles down to a single CPU-level load
instruction.

    function SomeConstructor() {
        this.x = 10;    // x is a definite slot property
        this.y = 10;    // y is a definite slot property
        someComplicatedFunctionCall();
        this.z = 20;    // z is not a definite slot property.
    }

In the above example, the properties `x` and `y` can be determined to always
exist on any instance of `SomeConstructor` at definite locations, allowing
the engine to deterministically infer the position of `x` without a shape
guard.

This optimization can fail for a number of reasons.  If the types observed
at the property access are polymorphic (more than one type), this optimization
cannot succeed.  Furthermore, even if the object type is monomorphic, the
optimization will fail if the property being accessed is not a definite
slot as described above.

### GetProp_Unboxed

Similar to `GetProp_DefiniteSlot`.  Unboxed property reads are possible on
properties which satisfy all the characteristics of a definite slot, and
additionally have been observed to only store values of one kind of value.

Consider the following constructor:

    function Point(x, y) {
        this.x = x;
        this.y = y;
    }

If only integers are ever stored in the `x` and `y` properties,
then the instances of `Point` will be represented in an "unboxed" mode -
with the property values stored as raw 4-byte values within the object.

Objects which have the unboxed optimization are more compact.

### GetProp_CommonGetter

Optimizes access to properties which are implemented by a getter function,
where the getter is shared between multiple types.

This optimization applies most often when the property access site is
polymorphic, but all the object types are derived variants of a single
base class, where the property access refers to a getter on the base
class.

Consider the following example:
    function Base() {}
    Base.prototype = {
        get x() { return 3; }
    };

    function Derived1() {}
    Derived1.prototype = Object.create(Base.prototype);

    function Derived2() {}
    Derived1.prototype = Object.create(Base.prototype);

If a property access for `d.x` sees only instances of both `Derived1` and
`Derived2` for `d`, it can optimize the access to `x` to a call to the
getter function defined on `Base`.

This optimization applies to shared getters on both pure JS objects as well
as DOM objects.

### GetProp_InlineAccess

Optimizes a polymorphic property access where there are only a few different
types of objects seen, and the property on all of the different types is
determinable through a shape-check.

If a property access is monomorphic and the property's location is determinable
from the object's shape, but the property is not definite (see:
GetProp_DefiniteProperty), then this optimization may be used.

Alternatively, if the property access is polymorphic, but only has a few
different shapes observed at the access site, this optimization may be used.

This optimization compiles down to one-or more shape-guarded direct loads
from the object.  The following pseudocode describes the kind of machine
code generated by this optimization:

    if obj.shape == Shape1 then
      obj.slots[0]
    elif obj.shape == Shape2 then
      obj.slots[5]
    elif obj.shape == Shape3 then
      obj.slots[2]
    ...
    end

### GetProp_Innerize

Attempts to optimize a situation where a property access of the form
`window.PROP` can be directly translated into a property access on
the inner global object.

This optimization will always fail on property accesses which are
not on the window object.

It is useful because accessing global names via the 'window' object
is a common idiom in web programming.

### GetProp_InlineCache

This is the worst-case scenario for a property access optimization.  This
strategy is used when all the others fail.  The engine simply inserts
an inline cache at the property access site.

Inline caches start off as a jump to a separate piece of code called
a "fallback".  The fallback calls into the interpreter VM (which is
very slow) to perform the operation, and then decides if the operation
can be optimized in that particular case.  If so, it generates a new
"stub" (or freestanding piece of jitcode) and changes the inline cache
to jump to the stub.  The stub attempts to optimize further occurrences
of that same kind of operation.

Inline caches are an order of magnitude slower than the other optimization
strategies, and are an indication that the type inference engine has
failed to collect enough information to guide the optimization process.

## <a name="setprop"></a>SetProperty

### SetProp_CommonSetter

Optimizes access to properties which are implemented by a setter function,
where the setter is shared between multiple types.

This optimization applies most often when the property access site is
polymorphic, but all the object types are derived variants of a single
base class, where the property access refers to a setter on the base
class.

Consider the following example:
    function Base() {}
    Base.prototype = {
        set x(val) { ... }
    };

    function Derived1() {}
    Derived1.prototype = Object.create(Base.prototype);

    function Derived2() {}
    Derived1.prototype = Object.create(Base.prototype);

If a property write for `d.x = val` sees only instances of both `Derived1` and
`Derived2` for `d`, it can optimize the write to `x` to a call to the
setter function defined on `Base`.

This optimization applies to shared setters on both pure JS objects as well
as DOM objects.

### SetProp_TypedObject

Optimizes accesses to properties on TypedObjects.

### SetProp_DefiniteSlot

Optimizes a write to a well-known regular property on an object.  For this
optimization to succeed, the property needs to be well-defined on the object.
For objects constructed by constructor functions, this means that the property
needs to be defined in the constructor, before any complex logic occurs within
the constructor.

This is the best case for a regular "field" type property that is not
turned into a constant.  It compiles down to a single CPU-level load
instruction.

    function SomeConstructor() {
        this.x = 10;    // x is a definite slot property
        this.y = 10;    // y is a definite slot property
        someComplicatedFunctionCall();
        this.z = 20;    // z is not a definite slot property.
    }

In the above example, the properties `x` and `y` can be determined to always
exist on any instance of `SomeConstructor` at definite locations, allowing
the engine to deterministically infer the position of `x` without a shape
guard.

This optimization can fail for a number of reasons.  If the types observed
at the property access are polymorphic (more than one type), this optimization
cannot succeed.  Furthermore, even if the object type is monomorphic, the
optimization will fail if the property being written is not a definite
slot as described above.

### SetProp_Unboxed

Similar to `SetProp_DefiniteSlot`.  Unboxed property writes are possible on
properties which satisfy all the characteristics of a definite slot, and
additionally have been observed to only store values of one kind of value.

Consider the following constructor:

    function Point(x, y) {
        this.x = x;
        this.y = y;
    }

If only integers are ever stored in the `x` and `y` properties,
then the instances of `Point` will be represented in an "unboxed" mode -
with the property values stored as raw 4-byte values within the object.

Objects which have the unboxed optimization are more compact.

### SetProp_InlineAccess

Optimizes a polymorphic property write where there are only a few different
types of objects seen, and the property on all of the different types is
determinable through a shape-check.

If a property write is monomorphic and the property's location is determinable
from the object's shape, but the property is not definite (see:
GetProp_DefiniteProperty), then this optimization may be used.

Alternatively, if the property write is polymorphic, but only has a few
different shapes observed at the access site, this optimization may be used.

This optimization compiles down to one-or more shape-guarded direct stores
to the object.  The following pseudocode describes the kind of machine
code generated by this optimization:

    if obj.shape == Shape1 then
      obj.slots[0] = val
    elif obj.shape == Shape2 then
      obj.slots[5] = val
    elif obj.shape == Shape3 then
      obj.slots[2] = val
    ...
    end

### SetProp_InlineCache

This is the worst-case scenario for a property access optimization.  This
strategy is used when all the others fail.  The engine simply inserts
an inline cache at the property write site.

Inline caches start off as a jump to a separate piece of code called
a "fallback".  The fallback calls into the interpreter VM (which is
very slow) to perform the operation, and then decides if the operation
can be optimized in that particular case.  If so, it generates a new
"stub" (or freestanding piece of jitcode) and changes the inline cache
to jump to the stub.  The stub attempts to optimize further occurrences
of that same kind of operation.

Inline caches are an order of magnitude slower than the other optimization
strategies, and are an indication that the type inference engine has
failed to collect enough information to guide the optimization process.

## <a name="getelem"></a>GetElement

### GetElem_TypedObject

Attempts to optimized element accesses on array Typed Objects.

### GetElem_Dense

Attempts to optimize element accesses on densely packed array objects.  Dense
arrays are arrays which do not have any 'holes'.  This means that the array
has valid values for all indexes from `0` to `length-1`.

### GetElem_TypedStatic

Attempts to optimize element accesses on a typed array that can be determined
to always refer to the same array object.  If this optimization succeeds, the
'array' object is treated as a constant, and is not looked up or retrieved from
a variable.

### GetElem_TypedArray

Attempts to optimize element accesses on a typed array.

### GetElem_String

Attempts to optimize element accesses on a string.

### GetElem_Arguments

Attempts to optimize element accesses on the `arguments` special object
available in functions.  This optimization only works if the arguments
object is used in well-understood ways within the function. The function
containing the arguments.length is allowed to use the arguments object in
the following ways without disabling this optimization:

- Access `arguments.length`
- Access `arguments.callee`
- Access individual args using `arguments[i]`
- Save `arguments` into variables, as long as those variables cannot be
  accessed by any nested function, and as long as there exists no `eval`
  anywhere within the function or nested function definitions.
- Call a function using `f.apply(obj, arguments)`

If the function contains any use of the arguments object that falls out of the
cases defined above, this optimization will be suppressed. In particular,
`arguments` cannot be returned from the function, or passed as an argument into
calls (except for the `apply` case above).

### GetElem_ArgumentsInlined

Similar to GetEelem_Arguments, but optimizes cases where the access on
`arguments` is happening within an inlined function.  In these cases, an
access of the form `arguments[i]` can be directly translated into a
direct reference to the corresponding argument value in the inlined call.

Consider the following:
    function foo(arg) {
      return bar(arg, 3);
    }
    function bar() {
      return arguments[0] + arguments[1];
    }

In the above case, if `foo` is compiled with Ion, and the call to `bar`
is inlined, then this optimization can transform the entire procedure to
the following pseudo-code:

    compiled foo(arg):
        // inlined call to bar(arg, 3) {
        return arg + 3;
        // }

### GetElem_InlineCache

This is the worst-case scenario for a element access optimization.  This
strategy is used when all the others fail.  The engine simply inserts
an inline cache at the property write site.

Inline caches start off as a jump to a separate piece of code called
a "fallback".  The fallback calls into the interpreter VM (which is
very slow) to perform the operation, and then decides if the operation
can be optimized in that particular case.  If so, it generates a new
"stub" (or freestanding piece of jitcode) and changes the inline cache
to jump to the stub.  The stub attempts to optimize further occurrences
of that same kind of operation.

Inline caches are an order of magnitude slower than the other optimization
strategies, and are an indication that the type inference engine has
failed to collect enough information to guide the optimization process.

## <a name="setelem"></a>SetElement

### SetElem_TypedObject

Attempts to optimized element writes on array Typed Objects.

### SetElem_TypedStatic

Attempts to optimize element writes on a typed array that can be determined
to always refer to the same array object.  If this optimization succeeds, the
'array' object is treated as a constant, and is not looked up or retrieved from
a variable.

### SetElem_TypedArray

Attempts to optimize element writes on a typed array.

### SetElem_Dense

Attempts to optimize element writes on densely packed array objects.  Dense
arrays are arrays which do not have any 'holes'.  This means that the array
has valid values for all indexes from `0` to `length-1`.

### SetElem_Arguments

Attempts to optimize element writes to the `arguments` special object
available in functions.  This optimization only works if the arguments
object is used in well-understood ways within the function. The function
containing the arguments.length is allowed to use the arguments object in
the following ways without disabling this optimization:

- Access `arguments.length`
- Access `arguments.callee`
- Access individual args using `arguments[i]`
- Save `arguments` into variables, as long as those variables cannot be
  accessed by any nested function, and as long as there exists no `eval`
  anywhere within the function or nested function definitions.
- Call a function using `f.apply(obj, arguments)`

If the function contains any use of the arguments object that falls out of the
cases defined above, this optimization will be suppressed. In particular,
`arguments` cannot be returned from the function, or passed as an argument into
calls (except for the `apply` case above).

### SetElem_InlineCache

This is the worst-case scenario for a element write optimization.  This
strategy is used when all the others fail.  The engine simply inserts
an inline cache at the property write site.

Inline caches start off as a jump to a separate piece of code called
a "fallback".  The fallback calls into the interpreter VM (which is
very slow) to perform the operation, and then decides if the operation
can be optimized in that particular case.  If so, it generates a new
"stub" (or freestanding piece of jitcode) and changes the inline cache
to jump to the stub.  The stub attempts to optimize further occurrences
of that same kind of operation.

Inline caches are an order of magnitude slower than the other optimization
strategies, and are an indication that the type inference engine has
failed to collect enough information to guide the optimization process.

## <a name="call"></a>Call

### Call_Inline

A function call `f(x)` usually pushes a frame onto the call stack. Inlining a
call site conceptually copies the body of the callee function and pastes it
in place of the call site and avoids pushing a new execution frame. Usually,
hot functions do well to be inlined. This is one of the most important
optimizations the JIT performs.

Ion inlines both interpreted (i.e., written in JavaScript) functions and
native (i.e., built-ins such as `Math.sin` implemented in C++).

A successfully inlined call site has the outcome Inlined.

Failure to inline comes in two flavors: unable (e.g., unable to determine
exact callee) and unwilling (e.g., heuristics concluded that the time-space
tradeoff will not pay off).