A Proposal For Numeric Separators
This feature enables developers to make their numeric literals more readable by creating a visual separation between groups of digits.
var thousands = 10_000; // Instead of 10000.
var credit_card_number = 1234_5678_9012_3456; // Instead of 123456789012345.
var social_security_number = 999_99_9999; // Instead of 999999999.
var pi = 3.14_15; // Instead of 3.1415
var bytes = 0b11010010_01101001_10010100_10010010; // Instead of 0b11010010011010011001010010010010.
var 0xCAFE_BABE; // Instead of 0XCAFEBABE.
This feature is designed to have no impact on the interpretation semantics of numeric literals:
_ are to be ignored by interpreters and should have no effect. They are meant exclusively as a visual clue to aid development and have no runtime semantics.
We want to optimize to cover the most common use cases and discourage patterns that would be frowned upon in style guides later on.
With that in mind, here is what we think is a good balance:
- to use the
- only one consecutive underscore is allowed.
- only between digits (not allowed at the beginning or end of literals).
We want to make sure that this syntax is consistent with the usage of the standard library. This will involve making libraries like the following compatible with the proposed parsing rules:
Our strawnman strategy is to start with a more restrictive rule (i.e. disallow both idioms) and losen it upon later if needed (as opposed to starting more broadly and worrying about backwards compatibility trying to tighten it up later).
In addition to that, we couldn't find good/practical evicence where (a) multiple consecutive underscores or (b) underscores before/after numbers are used effectively, so we chose to leave that addition to a later stage if needed/desired.
The main considerations as we look into other languages are:
- should we allow multiple separators (e.g. enforcing 10_000 or allowing 10_________000)?
- what are the restrictions on location (e.g. head/tail allowed _100? or does it need to be between numbers 10_000_000?)?
- which separator digit to use (e.g. 1_000, 1,000 , 1 000)?
Common rules available in other languages are:
- Multiple consecutive underscore allowed and only between digits
- Multiple consecutive underscore allowed, in most positions except for the start of the literal or special positions like a decimal point.
- Only every other N digits (e.g. N = 3 for decimal literals or 4 for hexadecimal ones)
More work needs to be done to determine the feasibility and desirability of using different characters. As a reference, most languages use
_ (C++ being the notable exception to use
_ is a reasonable starting point.
Here are some characters that should be looked at to assess feasibility (i.e. is it gramatically possible?) and desirability (e.g. does it lead to a more readable code?):
_(Java, Python, Perl, Ruby, Rust, Julia, Ada, C#)
This strawnman proposal was developed with @ajklein and @domenic.
- Java7: multiple, only between digits.
long creditCardNumber = 1234_5678_9012_3456L;
long socialSecurityNumber = 999_99_9999L;
float pi = 3.14_15F;
long hexBytes = 0xFF_EC_DE_5E;
long hexWords = 0xCAFE_BABE;
long maxLong = 0x7fff_ffff_ffff_ffffL;
byte nybbles = 0b0010_0101;
long bytes = 0b11010010_01101001_10010100_10010010;
float pi1 = 3_.1415F; // Invalid; cannot put underscores adjacent to a decimal point
float pi2 = 3._1415F; // Invalid; cannot put underscores adjacent to a decimal point
= 999_99_9999_L; // Invalid; cannot put underscores prior to an L suffix
int x1 = _52; // This is an identifier, not a numeric literal
int x2 = 5_2; // OK (decimal literal)
int x3 = 52_; // Invalid; cannot put underscores at the end of a literal
int x4 = 5_______2; // OK (decimal literal)
int x5 = 0_x52; // Invalid; cannot put underscores in the 0x radix prefix
int x6 = 0x_52; // Invalid; cannot put underscores at the beginning of a number
int x7 = 0x5_2; // OK (hexadecimal literal)
int x8 = 0x52_; // Invalid; cannot put underscores at the end of a number
int x9 = 0_52; // OK (octal literal)
int x10 = 05_2; // OK (octal literal)
int x11 = 052_; // Invalid; cannot put underscores at the end of a number
- C++: single, between digits (different separator chosen
int m = 36'000'000 // digit separators make large values more readable
TODO(goto): find an example.
- Perl: multiple, anywhere
3.14_15_92 # a very important number
4_294_967_296 # underscore for legibility
0xff # hex
0xdead_beef # more hex
- Ruby: single, only between digits.
- Rust: multiple, anywhere.
0b1111_1111_1001_0000_i32; // type i32 1_234.0E+18f64
- Julia: single, only between digits.
julia> 10_000, 0.000_000_005, 0xdead_beef, 0b1011_0010 (10000,5.0e-9,0xdeadbeef,0xb2)
- Ada: single, only between digits.
- Python Proposal: Underscore in Numeric Literals: single, only between digits.
# grouping decimal numbers by thousands
amount = 10_000_000.0
# grouping hexadecimal addresses by words
addr = 0xCAFE_F00D
# grouping bits into nibbles in a binary literal
flags = 0b_0011_1111_0100_1110
# same, for string conversions
flags = int('0b_1111_0000', 2)
- C# Proposal: Digit Separators: multiple, only between digits.
int bin = 0b1001_1010_0001_0100; int hex = 0x1b_a0_44_fe; int dec = 33_554_432; int weird = 1_2__3___4____5_____6______7_______8________9; double real = 1_000.111_1e-1_000;