sgo.to

A Proposal For Numeric Separators

Numeric Separators

This is a strawman proposal to introduce a separator in numeric literals in ECMAScript.

Motivation

This feature enables developers to make their numeric literals more readable by creating a visual separation between groups of digits.

For example:

var thousands = 10_000; // Instead of 10000.
var credit_card_number = 1234_5678_9012_3456; // Instead of 123456789012345.
var social_security_number = 999_99_9999; // Instead of 999999999.
var pi = 3.14_15; // Instead of 3.1415
var bytes = 0b11010010_01101001_10010100_10010010; // Instead of 0b11010010011010011001010010010010.
var 0xCAFE_BABE; // Instead of 0XCAFEBABE.

Strawnman

Semantics

This feature is designed to have no impact on the interpretation semantics of numeric literals: _ are to be ignored by interpreters and should have no effect. They are meant exclusively as a visual clue to aid development and have no runtime semantics.

Syntax

We want to optimize to cover the most common use cases and discourage patterns that would be frowned upon in style guides later on.

With that in mind, here is what we think is a good balance:

  • to use the _ character.
  • only one consecutive underscore is allowed.
  • only between digits (not allowed at the beginning or end of literals).

Standard library

We want to make sure that this syntax is consistent with the usage of the standard library. This will involve making libraries like the following compatible with the proposed parsing rules:

  • Number()
  • parseInt()

Alternative Syntax

Our strawnman strategy is to start with a more restrictive rule (i.e. disallow both idioms) and losen it upon later if needed (as opposed to starting more broadly and worrying about backwards compatibility trying to tighten it up later).

In addition to that, we couldn't find good/practical evicence where (a) multiple consecutive underscores or (b) underscores before/after numbers are used effectively, so we chose to leave that addition to a later stage if needed/desired.

Considerations

The main considerations as we look into other languages are:

  • should we allow multiple separators (e.g. enforcing 10_000 or allowing 10_________000)?
  • what are the restrictions on location (e.g. head/tail allowed _100? or does it need to be between numbers 10_000_000?)?
  • which separator digit to use (e.g. 1_000, 1,000 , 1 000)?

Common rules available in other languages are:

  • Multiple consecutive underscore allowed and only between digits
  • Multiple consecutive underscore allowed, in most positions except for the start of the literal or special positions like a decimal point.
  • Only every other N digits (e.g. N = 3 for decimal literals or 4 for hexadecimal ones)

Character

More work needs to be done to determine the feasibility and desirability of using different characters. As a reference, most languages use _ (C++ being the notable exception to use '), so _ is a reasonable starting point.

Here are some characters that should be looked at to assess feasibility (i.e. is it gramatically possible?) and desirability (e.g. does it lead to a more readable code?):

  • _ (Java, Python, Perl, Ruby, Rust, Julia, Ada, C#)
  • ' (C++)
  • .

Acknowledgements

This strawnman proposal was developed with @ajklein and @domenic.

References

Prior art

  • Java7: multiple, only between digits.
long creditCardNumber = 1234_5678_9012_3456L;
long socialSecurityNumber = 999_99_9999L;
float pi = 3.14_15F;
long hexBytes = 0xFF_EC_DE_5E;
long hexWords = 0xCAFE_BABE;
long maxLong = 0x7fff_ffff_ffff_ffffL;
byte nybbles = 0b0010_0101;
long bytes = 0b11010010_01101001_10010100_10010010;

Trade-offs:

float pi1 = 3_.1415F;      // Invalid; cannot put underscores adjacent to a decimal point
float pi2 = 3._1415F; // Invalid; cannot put underscores adjacent to a decimal point
long socialSecurityNumber1
= 999_99_9999_L; // Invalid; cannot put underscores prior to an L suffix

int x1 = _52; // This is an identifier, not a numeric literal
int x2 = 5_2; // OK (decimal literal)
int x3 = 52_; // Invalid; cannot put underscores at the end of a literal
int x4 = 5_______2; // OK (decimal literal)

int x5 = 0_x52; // Invalid; cannot put underscores in the 0x radix prefix
int x6 = 0x_52; // Invalid; cannot put underscores at the beginning of a number
int x7 = 0x5_2; // OK (hexadecimal literal)
int x8 = 0x52_; // Invalid; cannot put underscores at the end of a number

int x9 = 0_52; // OK (octal literal)
int x10 = 05_2; // OK (octal literal)
int x11 = 052_; // Invalid; cannot put underscores at the end of a number
  • C++: single, between digits (different separator chosen ').
int m = 36'000'000  // digit separators make large values more readable  
TODO(goto): find an example.
  • Perl: multiple, anywhere
 3.14_15_92          # a very important number
4_294_967_296 # underscore for legibility
0xff # hex
0xdead_beef # more hex
  • Ruby: single, only between digits.
1_234
  • Rust: multiple, anywhere.
0b1111_1111_1001_0000_i32;         // type i32
1_234.0E+18f64
  • Julia: single, only between digits.
julia> 10_000, 0.000_000_005, 0xdead_beef, 0b1011_0010
(10000,5.0e-9,0xdeadbeef,0xb2)
  • Ada: single, only between digits.
123_456
3.14159_26

Ongoing Proposals

# grouping decimal numbers by thousands
amount = 10_000_000.0

# grouping hexadecimal addresses by words
addr = 0xCAFE_F00D

# grouping bits into nibbles in a binary literal
flags = 0b_0011_1111_0100_1110

# same, for string conversions
flags = int('0b_1111_0000', 2)
int bin = 0b1001_1010_0001_0100;
int hex = 0x1b_a0_44_fe;
int dec = 33_554_432;
int weird = 1_2__3___4____5_____6______7_______8________9;
double real = 1_000.111_1e-1_000;

Related Work