Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The counterpoint is how many PL creators ignore all existing research and go on to make the same old mistakes.


AKA, Java's 'billion dollar mistake' of allowing null references when there is no great justification for it.


Honestly I don't think including null in the language is a big deal. Academics tend to think it is a big deal because it makes it harder to prove programs correct, but almost nobody does that with Java programs, so it doesn't really matter to most programmers.

null is a natural representation for an optional type. It's efficient because the CPU can implement it by page faulting on null dereferences. Yes, you can have Optional<T> and so forth, but it uses more memory, and Java already has a problem with using a lot of memory.

Even assuming we could somehow solve the efficiency problems with a super-clever compiler and a built-in optional type, the language needs a way to represent the state that an object starts out with. For integers and floats that is 0, for booleans it is false, and for objects it is null. You could argue with the whole concept of imperative programming (and many academics do), but if you have X happening before Y in your imperative program, you better have a way of representing the state of the objects that haven't yet been initialized by Y to X.

C++ got a non-nullable type in the form of references, and it didn't exactly turn the language into a paragon of sweetness and light. It just made things more confusing and annoying since you had to keep flipping back and forth between pointers and refereences, depending on the API you were using. And in typical C++ fashion, they invoked undefined behavior to answer the question of what happened if you accessed a not-yet-initialized reference (for example in a constructor).


There's nothing stopping a compiler from implementing Option<T> as a possibly-null reference when T is a pointer type, which would give you the same efficiency as having nulls in the language while making unchecked usage easy to catch at compile time. (I don't know if any languages do this currently, but there's no reason it couldn't be done. It does turn Option into a special built-in type, but it's universal enough for that to be a decent idea.)

And it's not clear that variables should have values before assignment, beyond "undefined"/"error to use this" -- explicitly initialising a variable to some sentinel value is clearer and safer than relying on the default value being something in particular.


> There's nothing stopping a compiler from implementing Option<T> as a possibly-null reference when T is a pointer type, which would give you the same efficiency as having nulls in the language while making unchecked usage easy to catch at compile time.

Rust does exactly that for its Option<T> type. In fact, it automatically works for a certain category of enums, not just for built-in ones. For example, this type:

    enum E<'r> {
        A(&'r u64),
        B,
    }
has the same size as a pointer (because references are guaranteed to be non-null), as has this type:

    enum F {
        A(Box<u64>),
        B,
    }
where Box<u64> is an owning pointer to a value on the heap. Rust's Option type doesn't get any special treatment and is not built-in, instead it's implemented like this in the standard library:

    pub enum Option<T> {
        None,
        Some(T)
    }


Specifically, the http://doc.rust-lang.org/stable/core/nonzero/struct.NonZero.... trait can be used to ring this kind of thing to any type.


There's nothing stopping a compiler from implementing Option<T> as a possibly-null reference when T is a pointer type, which would give you the same efficiency as having nulls in the language while making unchecked usage easy to catch at compile time.

I talked about this in my earlier comment, saying that we could solve the efficiency problem with "a super-clever compiler and a built-in optional type." (To respond to adwn's comment, I would view built-in support for sum datatypes as "built-in support for optional.")

And it's not clear that variables should have values before assignment, beyond "undefined"/"error to use this" -- explicitly initialising a variable to some sentinel value is clearer and safer than relying on the default value being something in particular.

null IS a type which means "undefined/error to use this."

In the particular context of Java, null solves some of the problems of the language, like what happens when a constructor invokes a derived method which reaches up and accesses a not-yet-initialized variable in the superclass. I do think some of these problems could be solved in a different way, but it's not completely straightforward.

I am curious about how Rust solves some of these problems. I'll have to take a look.


null IS a type which means "undefined/error to use this."

It causes runtime errors, rather than compile time ones. It also can move the place an error is detected to elsewhere (since passing null around and sticking it places works fine, right up until you try and use the value you expect), which can be a bit of a pain to trace back.

In the particular context of Java, null solves some of the problems of the language, like what happens when a constructor invokes a derived method which reaches up and accesses a not-yet-initialized variable in the superclass. I do think some of these problems could be solved in a different way, but it's not completely straightforward.

Disallowing method calls before all variables have been initialised in the constructor seems like a relatively simple mostly-fix. For any non-final values, it'd work fine to just set them to some explicit dummy value, and final variables should probably be explicitly assigned anyway. (I can't think of many cases where having computation implicitly operate on undefined values is a good thing...)

Also, to adwn: thanks -- I figured Rust probably did something like that, but haven't really played with it much and hadn't bothered to go check.


> In the particular context of Java, null solves some of the problems of the language, like what happens when a constructor invokes a derived method which reaches up and accesses a not-yet-initialized variable in the super-class. I do think some of these problems could be solved in a different way, but it's not completely straightforward.

That's a non-issue. A derived class shouldn't be able to access any variables from the super-class until after the super-class's constructor has completed. All variables in the super-class should have to be initialized before the constructor completes. The super-class constructor shouldn't be able to read from a variable until after it can be statically determined that it has initialized the variable.

If a class can't work within these constraints, then it can declare the variable as Option<T> initialized to None, and then it is back to the normal Java semantics for that one variable.


> If a class can't work within these constraints, then it can declare the variable as Option<T> initialized to None, and then it is back to the normal Java semantics for that one variable.

By doing that you lose the safety of variables being marked final.

I wish there was a way to seal a variable such that it could not be changed (in the same sense that final works in, not truly readonly) after a certain point, but it could be before then.

I find myself writing far too many variables that should be final, and indeed are final after a certain point, but cannot be marked as such because said point is after the constructor. In particular with lazy-loading.


A final variable must be assigned in the constructor, so it already meets all of those constraints.


I quote:

> In particular with lazy-loading.

Lazy loading != assigned in the constructor


You can't lazy load a final variable. It has to be assigned in the constructor. If your point is that getting rid of null doesn't solve every single problem that exists, I agree. However, it in no way causes a degradation from the current status quo in nullable-by-default languages.


> Honestly I don't think including null in the language is a big deal. Academics tend to think it is a big deal because it makes it harder to prove programs correct, but almost nobody does that with Java programs, so it doesn't really matter to most programmers.

Speak for yourself. As a professional Java (well, mostly Scala) programmer, not a week goes by without a null pointer exception slowing down development.

> the language needs a way to represent the state that an object starts out with. For integers and floats that is 0, for booleans it is false, and for objects it is null.

No, you don't need a value, you just need semantics for what happens if you do that. E.g. you could make a UninitializedValueException that would only be thrown when you tried to access an uninitialized field. That would be effectively the same behaviour as currently in that case, but it would be much easier to diagnose than NullPointerException, since it can only be thrown when you access an uninitialized field, not e.g. because a map didn't contain a particular entry.

> Even assuming we could somehow solve the efficiency problems with a super-clever compiler and a built-in optional type

This is disingenuous. It's not "somehow" with a "super-clever" compiler. It's a triviality.

> C++ got a non-nullable type in the form of references, and it didn't exactly turn the language into a paragon of sweetness and light. It just made things more confusing and annoying since you had to keep flipping back and forth between pointers and refereences, depending on the API you were using.

That's an argument for not trying to retrofit non-null references onto an existing language. It's not an argument for having nullable references in a new language.

> And in typical C++ fashion, they invoked undefined behavior to answer the question of what happened if you accessed a not-yet-initialized reference (for example in a constructor).

Don't do that then.


There are varying degrees of "proof", ie. we can have more or less confidence in an argument. In particular, we need to distinguish between "Mathematical proof" and "formal proof".

Mathematical proof is what we tend to think of when we see the word "proof", and it can be defined something like "a thoroughly convincing argument".

A "formal proof" is very different. If we're given a finite set of strings, and a finite set of string-rewriting-rules, then a "formal proof of X" is just "the order in which to apply the rules, such that we turn the given strings into X".

For example, the rule called "modus ponens" says:

    If I have strings of the form "X" and "X -> Y", then I can get the string "Y"
If I'm given the modus ponens rule, and the strings "A", "B" and "A -> B -> C", then this is a proof of "C":

    1) A -> B -> C   (given)
    2) A             (given)
    3) B -> C        (modus ponens with 2 and 1)
    4) B             (given)
    5) C             (modus ponens with 4 and 3)
When formal proofs were introduced, the idea was to choose the given strings (axioms) and rules such that "formal proof" meant pretty much the same as "Mathematical proof", ie. that formal proofs are convincing and that non-formal proofs aren't.

It's pretty easy to see that type systems (including Java's) are formal systems. The "given strings" are the types with primitive values, ie. I am 'given' the string "bool" since I can always say "true" or "false". The Java equivalents of strings like "A -> B" are methods, with argument type "A" and return type "B". The modus ponens rule says "calling a method with an argument of the correct type gives the result type" (ie. "A -> B" and "A" gives "B"). The Java equivalent of formal proofs are Java programs.

From this perspective, type-checking a Java program is checking the formal proof it represents; ie. ensuring that we only use the rules we were given (anything else is a syntax error, like `bool(5)`) and that the strings match up (eg. that we don't use modus ponens on "A" and "B -> C", which is a type error, like `3 + "hello"`).

> Academics tend to think it is a big deal because it makes it harder to prove programs correct, but almost nobody does that with Java programs, so it doesn't really matter to most programmers.

In fact, it's the exact opposite: every time a Java programmer compiles their code, they are formally verifying that their program is correct. The only Java programmers who don't prove their programs are correct are those few who've never compiled their code.

That might seem pedantic, ie. that I've redefined "proof" to win an argument, but in fact that alternative definition has been around for a century and, more importantly, it is the one PL researchers use (I should know, I am one ;) ).

The reason academics don't like NULL isn't that it makes proving things harder; it's that it makes proving things too easy! Java types are formal statements, by definition; Java programs are proofs of those statements, by definition; we have a whole bunch of machinery for checking these types, IDEs with auto-completion, automated refactoring, etc. so why not use it to prove stuff we care about? For example, a crypto library could encode some security properties as types to make sure consumer classes are using them properly. Then we could have our application require proof that our implementations are correct, like this:

    // Works for any LoginComponent, as long as it uses crypto in a provably safe way
    final class Application<LoginComponent> {
        public Application(LoginComponent lc, UsesCryptoSafely<LoginComponent> proof);
        // ...
    }
The reason Java programmers don't do this isn't that it's too hard. It's that it's far too easy:

    Application<MyLogin> app = new Application(new MyLogin(), NULL);
So easy, in fact, that it's useless: we can use "NULL" to literally prove anything in Java. These are still "formal proofs", but they're very far away from "Mathematical proofs", ie. telling me that your Java program type-checks doesn't convince me that it works the way you want, precisely because the axioms and rules used in Java cannot be used to convince me.

This isn't an ivory tower, pedantic, academical issue; it's an incredibly practical piece of software engineering: we want to find as many bugs as possible for the smallest cost possible. Java is already paying a high cost for its type system, especially since it has no type inference. Yet the existence of NULL undermines it's usefulness for spotting bugs. Basically, Java hits an "anti-sweet spot" where we must annotate everything explicitly and architect our programs to work within the type system's constraints, yet it doesn't give us much confidence that we've squashed bugs.

Compare this to un(i)typed languages like Python, where we don't need annotations and our architectures aren't as constrained; we lose a little confidence but we save a huge cost.

Compare this to strongly typed languages like Haskell, where we're constrained by the type system and we occasionally need a couple of annotations, but we gain a large amount of confidence for that price.

> the language needs a way to represent the state that an object starts out with. For integers and floats that is 0, for booleans it is false, and for objects it is null.

I think you're confusing values with variables. The only value (object, etc.) which is NULL is, well, NULL. You can't change a NULL into an initialised instance; you can only replace it.

You can initialise a variable to be NULL, then redefine it to point at an object later. In that case, I'd say you should either be using an Optional value, or you're initialising your variable too early.


That was a great post! I like the term "anti-sweet spot" for Java's type system, and that's what it truly feels like in practice. Like complex magic spells that aren't that useful in the end.


"In fact, it's the exact opposite: every time a Java programmer compiles their code, they are formally verifying that their program is correct. The only Java programmers who don't prove their programs are correct are those few who've never compiled their code."

I don't think it's reasonable to use all of "prove", "their programs" and "correct" there. Weakening or qualifying any of those could produce a statement I emphatically agree with.


> Weakening or qualifying any of those could produce a statement I emphatically agree with.

My qualification was meant to be "according to the formal systems definition of the words 'prove', 'correct', etc., which was supposed to be applied to consistent systems; yet Java is inconsistent".

Well, I'd actually say that all languages are consistent, it's only particular interpretations of the encodings that may be inconsistent, but that's another discussion ;)


Ah, yeah, when you say you're proving something I think it's worth it to be specific about what, and extra explicit when it differs from the expectations of the audience. There's a lot of "correctness" as used by... anyone not focused on formal systems that is missed by your usage, I think.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: