ewx: (geek)
[personal profile] ewx

Many languages that have references (pointers, whatever) have a distinguished null value that refers to "nowhere": None, a null pointer, undef etc. Dereferencing this value produces some kind of runtime error (a segmentation fault or an exception, for instance) so you have check every possible use of any reference that can ever be null. (In some cases it's acceptable to take the exception or even the crash, but often it not)

What I'd like is a distinction between optional and mandatory references. An optional reference can be null, but the compiler won't let you dereference it. A mandatory reference is never null, and the compiler won't let you do anything that makes it null. Finally, all references would be either null or safe to dereference and no constructions which violated this would be permitted.

Then, there'd be some convenient way to get the compiler to do the check and let you dereference only known-to-be-safe optional pointers.

In something very C-like, you might have a new qualifier, optional, which applied to pointers only, and a with construction to do safely strip the qualifier:

int *optional op = 0;         /* optional pointer to int */
int *optional uop;            /* default-initialized to 0 */
int x, *mp = &x;              /* mandatory pointer to int */
int *ump;                     /* error: must be initialized */

*mp;                          /* OK */
mp = op;                      /* error: cannot assign optional to mandatory */
mp = 0;                       /* error: mandatory pointer cannot be null */

*op;                          /* error: cannot dereference optional pointer */
op = mp;                      /* OK */ 
op = 0;                       /* OK */

with(op) {
  /* op behaves as if declared 'int *op' within this block */
  *op;                        /* OK */
} else {                      /* else part is optional */
  assert(op == NULL);         /* never fails */
}

(There'd also be some obvious rules about initialization but they are not the interesting bit.)

(no subject)

Date: 2007-10-12 09:36 am (UTC)
From: [identity profile] sidheag.livejournal.com
Argh. Here is a totally useless answer: yes, there is such a language, because I remember hearing a conference talk about the feature. But I can't remember anything else about the talk, even where I heard it or anything that might serve as a search term... I'll see if my subconscious will turn up more.

(no subject)

Date: 2007-10-12 09:48 am (UTC)
From: [identity profile] hsenag.livejournal.com
Couldn't you knock most of this up with C++ templates? with(op) would have to be translated into some standard boilerplate, though.

(no subject)

Date: 2007-10-12 09:50 am (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
All the templates and overloading in the world won't stop you writing code using the existing pointers and references syntax.

(no subject)

Date: 2007-10-12 09:58 am (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
Of course it has to be a language that is usable and useful in other respects l-)

(no subject)

Date: 2007-10-12 10:03 am (UTC)
From: [identity profile] aardvark179.livejournal.com
Some dynamic languages take an interesting approach to this. In Magik for example _unset is still an object (everything is an object id, so everything is a reference really) with ancestry like this
sw:unset
    sw:enumerated_format_mixin
        sw:magikc_objects_mixin
        sw:object
    sw:unset_mixin

and all the associated method table etc. you'd expect. The important method on this is default() which takes one argument. On object (and everything that inherits from that) default returns _self, on unset it returns its argument. So the first thing often you do in a method that might be passed unset as some of its arguments is
_local a << a.default(some_sensible_default)

(no subject)

Date: 2007-10-12 10:09 am (UTC)
fanf: (Default)
From: [personal profile] fanf
ML reference types are never null. You can make them nullable by using the 'a ref option type - option is just a normal data type that can wrap any type. http://www.standardml.org/Basis/option.html#SIG:OPTION.option:TY:SPEC

I'm sure there's an OO language with non-nullable references but my memory fails me...

(no subject)

Date: 2007-10-12 10:16 am (UTC)
From: [identity profile] aardvark179.livejournal.com
You can't stop them using the syntax, but you can overload the * and & operators. If it's in your standard include files it should certainly help.

I'm actually slightly horrified that C++ will let you overload that sort of thing.

(no subject)

Date: 2007-10-12 10:21 am (UTC)
cjwatson: (Default)
From: [personal profile] cjwatson
The semantics would be the other way round from what you suggest, but perhaps extending gcc's 'nonnull' function attribute to variables would be the first step. A variable with such an attribute would require anything assigned to it to be verifiably non-null. Of course the standard library wouldn't (initially) use this so you would have to use something equivalent to a cast all the time (or wrapper functions like xmalloc that would return nonnull variables), and it would take a while to get to the point where you could say -Wdereference-maybe-null -Werror and actually compile anything ...

(no subject)

Date: 2007-10-12 10:48 am (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
What annoys me, by contrast, is that there's only one null. I occasionally find myself having to declare phony variables which are never used except to provide an address that I know isn't used for anything else, so that I can use &my_phony_variable as a second special pointer value.

So perhaps we should solve both these problems by defining a basic "pointer" type in such a way that you can't ever create a value of that type which is null; then we'd also provide Haskell-like types with alternatives, along the lines of

datatype NullablePointer = NULL | PointsTo(char *);

That lets you distinguish between a mandatory and optional reference, and it also lets me define

datatype PointerOrThreeErrorIndicators = Error1 | Error2 | Error3 | PointsTo(char *);

for my special purposes.

(no subject)

Date: 2007-10-12 10:58 am (UTC)
fanf: (Default)
From: [personal profile] fanf
Some Apache projects use Java 5 annotations to mark parameters as non-nullable.

(no subject)

Date: 2007-10-12 11:01 am (UTC)
From: [identity profile] pjc50.livejournal.com
References _should_ be the "reference which cannot be null", but GCC at least will happily let me write:

int * ptr = 0;
int &ref = *ptr;

.. which then blows up on dereferencing ref. This is unhelpful.

(no subject)

Date: 2007-10-12 11:06 am (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
C++ can't stop you leaking references to auto or dynamically allocated objects beyond their lifetime, either; there's a fair bit of change you'd need to C/C++ to make their reference type(s) safe.

(no subject)

Date: 2007-10-12 11:37 am (UTC)
fanf: (Default)
From: [personal profile] fanf
Cyclone is a C-like language with lots of safety features and static checking added. It has non-null pointer types declared with an @ instead of a *.

(no subject)

Date: 2007-10-12 11:57 am (UTC)
From: [identity profile] baljemmett.livejournal.com
If it weren't for that (plus that fact that it's, y'know, icky), I'd point out that Visual Basic has Optional parameters:

Private Sub FrobFoo(ByRef theFoo As Foo, Optional howToFrob As FrobType)
    If IsMissing(howToFrob)
    Then
        DefaultFrob theFoo
    Else
        Select Case howToFrob
            Case ...
        End Select
    End If
End Sub


[NB: I haven't touched VB itself in eight years or so, but this is how I remember it working.]

I also heartily second [livejournal.com profile] simont's suggestion regarding mandatory-pointer-or-something-else types. I once implemented something similar as a tagged-union type thingy in C, but it's no substitute for proper language support.

[Sorry for the comment-spam, forgot about formatting the first time!]

(no subject)

Date: 2007-10-12 12:04 pm (UTC)
From: [identity profile] sidheag.livejournal.com
Ah, thank you - that's the one I heard a talk about.

(no subject)

Date: 2007-10-12 12:16 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
Does it stop you writing code that neglects to check optional values are indeed usable?

(no subject)

Date: 2007-10-12 12:37 pm (UTC)
fanf: (Default)
From: [personal profile] fanf
If I understand correctly, I think CLU has a similar arrangement to ML.

(no subject)

Date: 2007-10-12 01:05 pm (UTC)
From: [identity profile] baljemmett.livejournal.com
Alas, no, hence the comment about it not meeting your "usable and useful" criterion -- as I recall, all its type-checking is done at runtime :-/

C# has "nullable" types, though, which have some similar semantics -- any use in a context where the corresponding non-nullable type is required will result in a compile-time error. I haven't worked with C# much yet though, so am not entirely sure of the semantics apart from that they seemed sensible when I first encountered them!

(no subject)

Date: 2007-10-12 02:00 pm (UTC)
fanf: (Default)
From: [personal profile] fanf
C#'s nullable types are modified versions of value types (like int) and reference types only come in nullable form.

(no subject)

Date: 2007-10-12 02:12 pm (UTC)
ext_78: A picture of a plush animal. It looks a bit like a cross between a duck and a platypus. (Default)
From: [identity profile] pne.livejournal.com
I occasionally find myself having to declare phony variables which are never used except to provide an address that I know isn't used for anything else

Ah, yes. I remember the annoyance I encountered when trying to do this with a special string in Java.

Can't use a constant string (because the language guarantees that all instances of the same constant string are the same object), and the bug-checking plugin we had didn't like instantiating a new String();, giving a warning if I tried (roughly, "just go ahead and use a constant empty string; it's the same thing for nearly all intents and purposes").

(no subject)

Date: 2007-10-12 02:16 pm (UTC)
From: [identity profile] baljemmett.livejournal.com
Oh... That's a shame, but seems obvious now I remember they weren't added until the second version of the language. Ho-hum.

(no subject)

Date: 2007-10-12 02:47 pm (UTC)
gerald_duck: (duck and computer)
From: [personal profile] gerald_duck
C++'s references have a lot of problems (including being immutable), but they are at least tantamount to a pointer that shouldn't be null. Saying int *i_ptr; int &i_ref = *i_ptr; isn't required to fault immediately at run-time if i_ptr is 0, but that's at least an option for the implementation or some correctness-checking instrumentation.

In our codebase we have a lot of trouble with boost::shared_ptr: when they are passed between code modules it's unclear whether they're allowed to be null or not. A non-zero shared_ptr would be a nice type.

But, then again, there are lots of constraints on values it would be nice to be able to have checked automatically; null pointers are just one of the more common examples.

(no subject)

Date: 2007-10-12 03:57 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com

I like this idea. I'm not sure how one would do it in the C-like model above though - you'd have to define entirely new pointer-like types.

Another important kind of undereferencable pointer is the pointer just beyond the end of an array. This (and its generalization to iterators) is a useful thing to be able to think about, as it lets you use half-open intervals to refer to (sub-)ranges of arrays. Anything that supported multiple invalid values should support it fairly easily.

(Other bad pointers includes ones that you've freed, or pointers to auto variables that have gone out of scope. I think garbage collection is the underlying answer here.)

(no subject)

Date: 2007-10-12 04:13 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
Hmm. I've never been entirely convinced by garbage collection of mutable data, actually. If I've got a piece of data X that I'm changing – say, for example, that it's a list/tree/container or a set of statistics into which I'm funnelling data as it comes in – then there's a definite point at which X becomes obsolete, which is the point at which I finish reading its contents back out: the point at which I've read out the final statistics, or enumerated the contents of the container, and used them for something. After that, any further data shovelled into X will be effectively thrown into a black hole.

So if any other part of the code has a pointer to X, it's now semantically stale even if not pointer-dereferencingly stale: making use of that pointer is a bug, and having the GC environment conscientiously preserve X so that such code can accidentally shovel things into X and not notice is not a sensible move. I want to specifically mark X as finished with, so that any part of the program which subsequently tries to put things into it will throw an exception that I can debug.

Admittedly, the usual C approach of declaring X finished with and having future attempts to modify it lead to subtle undefined behaviour is not optimal either. But out of the two options above, I'd pick the non-garbage-collected one every time, because I prefer my misbehaviour unsubtle so I can spot it and fix it easily.

And that's without even getting into the question of what happens when X is not merely data but contains handles to some sort of external I/O; you certainly want to close output files (for example) explicitly rather than relying on the GC to get round to it at some point, and after you've closed them the last thing you want is for the data structure that described them to be "helpfully" left around.

Garbage collection is at its best when dealing with non-mutable data, because in that situation there's no semantic repercussion at all. So purely functional languages are its absolute sweet spot. But for normal procedural code with mutable data? Call me a C programmer if you must, but I've never been convinced.

(no subject)

Date: 2007-10-12 04:34 pm (UTC)
fanf: (Default)
From: [personal profile] fanf
Region inference can sometimes spot pointers that will never be dereferenced and therefore free the data before a GC would. However because it is a conservative analysis it can lead to space leaks when it infers a longer lifetime than necessary for some data. Practical implementations of compile-time region inference couple it with a run-time GC.

I like the C++ RAII model, and how it works so nicely when scope = lifetime. I'd like to see a good compromise between that and GC to deal with more complicated lifetimes.

Stoopid Question from A VB developer...

Date: 2007-10-12 05:34 pm (UTC)
ext_3375: Banded Tussock (Default)
From: [identity profile] hairyears.livejournal.com
Exactly how do you dereference a null pointer? Surely this is simply pointing it to null...

(no subject)

Date: 2007-10-12 07:31 pm (UTC)
From: [identity profile] mstevens.livejournal.com
http://nice.sourceforge.net/ implements something like this on the JVM - variables by default can't be null, and writing code that makes them null is a type error. Or, if you make it nullable, it's a type error to use them without checking if they're null or not first.

http://nice.sourceforge.net/safety.html has some details.

Re: Stoopid Question from A VB developer...

Date: 2007-10-12 08:16 pm (UTC)
From: [identity profile] catamorphism.livejournal.com
That's easy:

int *p = 0;
printf("%d\n", *p);

Any C compiler will happily accept this code, which will coredump when I run the resulting executable.

Re: Stoopid Question from A VB developer...

Date: 2007-10-12 10:09 pm (UTC)
From: [identity profile] nunfetishist.livejournal.com
Any C compiler will happily accept this code... when wrapped up in a few extra lines of boilerplate :)

Re: Stoopid Question from A VB developer...

Date: 2007-10-12 10:14 pm (UTC)
From: [identity profile] catamorphism.livejournal.com
Well, yes. You also have to copy it, paste it into a buffer in your favorite text editor, save it as an ASCII file, and invoke your compiler, but I thought it was safe to omit those steps.

(no subject)

Date: 2007-10-13 04:18 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
I'm not quite clear why this isn't completely orthogonal to automated memory management. Surely what you wanted was for X to mark itself as obsolete, say by X.close() if it was a file, and to generate a runtime error if it was used afterwards, and this a completely separate question to whether you free() it or just forget about the pointer(s) to it?

(no subject)

Date: 2007-10-13 05:35 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
Well, once you've called close() and future accesses to the thing are going to give an error anyway, you might as well garbage-collect it now rather than waiting for any remaining stale pointers to go away, because you don't want memory to be retained long-term based on pointers that will (must) never be accessed again merely because they still happen to exist. The obvious way to do that is to have close() also be free().

I suppose you could just mark it as freeable and wait for the next run of a garbage collector to collect it along with everything else, if you really wanted to.

(no subject)

Date: 2007-10-13 10:38 pm (UTC)
From: [identity profile] bellinghman.livejournal.com
Not on pointer types, it won't. Only on objects pretending to be pointers.

Re: Stoopid Question from A VB developer...

Date: 2007-10-19 03:56 am (UTC)
ext_243: (Default)
From: [identity profile] xlerb.livejournal.com
mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_ANON|MAP_PRIVATE, -1, 0);

There. Fixed the coredump.

Re: Stoopid Question from A VB developer...

Date: 2007-10-19 04:00 am (UTC)
From: [identity profile] catamorphism.livejournal.com
Sure, but he just wanted to know how to dereference a null pointer :-)

Re: Stoopid Question from A VB developer...

Date: 2007-10-19 08:01 am (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
Might make other programs crash though...

November 2025

S M T W T F S
      1
2345678
91011121314 15
1617 181920 2122
23242526272829
30      

Most Popular Tags

Expand Cut Tags

No cut tags