Simple Tokenizer & Segfault tutorial

I also wonder why posmax fails to take a dependent type ($showtype is still
size_t) in the slightly updated code (zip file attached).

sstream.zip (2.21 KB)

Yes, you can update them one-by-one.

Actually, you have pointed out a very crucial issue in supporting
references and dependent types inside the same framework.
This issue is addressed in the following paper:

A Modality for Safe Resource Sharing and Code Reentrancy

which is available on my homepage: www.cs.bu.edu/~hwxi

–Hongwei

Sorry, I am not completely sure what is trying to be achieved here; it
looks like p->pos should actually be p->sstream_pos.

If that is the case, then we are naming sstream_pos (a function) the same
as sstream_type.sstream_pos, which may be a bit confusing.

Also, it now seems that sstream_pos (the function) is no longer being
abstracted: I guess this is probably just shorthand, because it may
be obvious to do so (that is, to keep the implementation of sstream_pos as
a cloptr in some other stream make function, e.g. lin_strstream_make()).

One other thing, for a general use linear version of sstream, I guess
strptr would be the way to go (in both ATS and ATS2)?

Thanks again for the advice on style.

It is a dependent types: size_t is overloaded with size_int_t0ype and
size_t0ype.

If you put your entire code somewhere on-line, I will be happy to take a
look and possibly give you some comments.

You are definitely right about the juggling.

One question - strstream is abstract, as far as accessing the associated
data (I think), or how do you mean? "The name abstract type refers to a
type such that values of the type are represented in a way that is
completely hidden from users of the type."
This seems to be the case currently for strstream. As far as not knowing
what functions may be needed, I suppose this is true in the general case,
but LL(1) grammars suffice for my purposes, so I don’t even need the "prev"
function.

I suppose the “assume” keyword could add some additional flexibility to the
underlying data being supplied to the stream.

Thanks again for the feedback (hopefully I have not grossly misunderstood).
Your suggestion on strings is more cpu and memory efficient; I’ll add it to
the final version along with a better string_caps function.

Oh, I see. Your first version of whileCharTst is non-terminating, which
caused a segfault.

I am quite curious to know why you had a segfault. In ATS, segfault is most
likely caused by infinite recursion.
Otherwise, segfaults should not occur unless you use some unsafe features.

/media/RAID5/share/ATS_learning/toCNF2.dats: 6385(line=268, offs=7) –
6540(line=270, offs=16): error(ccomp): the dynamic variable [Tzr] is not a
function argument.
exit(ATS): uncaught exception:

_2fhome_2ffac2_2fhwxi_2fresearch_2fATS_2fIMPLEMENT_2fGeizella_2fAnairiats_2fsvn_2fats_2dlang_2fsrc_2fats_error_2esats__FatalErrorException(1233)

I’m not sure if I’m looking at the right area of code but in parseF
you have a local function ‘loop’ that uses Tzr, an argument to parseF.
The type of ‘loop’ is a C function type not a closure so i can’t close
over that value. Either pass it into ‘loop’ as an argument or make the
‘loop’ function a ‘cloref’ type:

implement
parseF (Tzr: GRtokenizer): GREXP = let
fun loop(term: GREXP): GREXP = case+ Tzr.peek () of
| TKand () => (Tzr.next(); loop( GRconj(term, parseD(Tzr)) )) (*
another Disjunction )
| _ => term
in loop (parseD (Tzr)) end (
first, parse the first D in the list *)

I have to admit, my code looked a lot worse than this, abstype aside. I
was trying to remember to use case/when but couldn’t recall the syntax. I
think it isn’t mentioned in the ATS Book until the section “Dependent Types
for Debugging”, and might be worth mentioning when “case” is introduced, or
even in the section on conditionals (imho).

I’ve attached the three files implementing this with the sstream abstype in
case anyone is ever interested.

Additionally, I was able to make a more reasonable version of string ->
caps(string) function using your sstream specification.

Thanks again,
Brandon

sstream_dats.txt (2.37 KB)

sstream_sats.txt (628 Bytes)

toCNF2_dats.txt (3.65 KB)

Sorry, I am not completely sure what is trying to be achieved here; it
looks like p->pos should actually be p->sstream_pos.

Yes.

If that is the case, then we are naming sstream_pos (a function) the
same as sstream_type.sstream_pos, which may be a bit confusing.

One is a label and the other is a function name; they can never mix.

Or your can drop the prefix ‘sstream_’ from the name of the label.

Also, it now seems that sstream_pos (the function) is no longer being
abstracted: I guess this is probably just shorthand, because it may
be obvious to do so (that is, to keep the implementation of sstream_pos
as a cloptr in some other stream make function, e.g. lin_strstream_make()).

Not sure what you mean here. It is abstract to the user. It can not be
abstract to the implementor.

One other thing, for a general use linear version of sstream, I guess
strptr would be the way to go (in both ATS and ATS2)?

Yes. But I could call it ‘mystring’ and then implement ‘mystring’ based on
strptr. In this way, you can use other data structure
for ‘mystring’. For instance, you may want some form of buffering. I
cannot over-emphasize the importance of using abstract types like this.

//
// HX: I outline a reasonable implementation as follows.
//
// Programming in ATS is actually a bit like doing OOP; it is only simpler.
//
// An abstype is like a class. In the following outlined code, you need
// to implement the abstype sstream; you can actually do the implmentation
// in C if you like.
//

(* ********************************* Begin CODE
******************************** *)

staload "prelude/SATS/string.sats"
staload "prelude/SATS/printf.sats"
staload “libc/SATS/stdio.sats”

#define NUL ‘\000’

exception UnforseenLexeme of ()

abstype sstream // for string streams

extern
fun sstream_get (ss: sstream): char
extern
fun sstream_inc (ss: sstream): void
extern
fun sstream_getinc (ss: sstream): char

extern
fun sstream_pos (ss: sstream): size_t
extern
fun sstream_lexeme (ss: sstream, i: size_t, j: size_t): string

extern
fun whileCharTst
(ss: sstream, chtst: char - bool): void
implement
whileCharTst
(ss, chtst) = let
val c = sstream_get (ss)
in
if chtst © then (sstream_inc (ss); whileCharTst (ss, chtst))
end

(* ****** ****** *)

datatype GRTOK =
| TKgene of string
| TKand of ()
| TKor of ()
| TKlpar of ()
| TKrpar of ()
| TKEND of ()

extern
fun getToken (ss: sstream): GRTOK

extern
fun isAlpha(c:char): bool
extern
fun isWhiteSpace(c:char): bool

implement
getToken (ss) = let
//
val () = whileCharTst (ss, isWhiteSpace)
//
val p0 = sstream_pos (ss)
val c0 = sstream_getinc (ss)
//
in
//
case+ 0 of
| _ when c0 = NUL => TKEND ()
| _ when c0 = ‘(’ => TKlpar ()
| _ when c0 = ‘)’ => TKrpar ()
| _ => let
val () = whileCharTst (ss, isAlpha)
val p1 = sstream_pos (ss)
val word = sstream_lexeme (ss, p0, p1)
in
case+ word of
| “AND” => TKand ()
| “OR” => TKand ()
| _ => TKgene (word)
end // end of [_]
//
end // end of [getToken]