Performance of case

I am trying to understand the performance of my code running on an Arduino
Uno. I optimize -O2, and by pulsing digital pins and using a scope, I can
measure where times goes. Other than interrupts from the UART, the
measurements are pretty consistant.

I ran into some code that is a magnitude slower compared to other code. I
have put three dots where I measure, in the following code.

For reference, calls to things like read_hex() which read two chars from
the UART and make conversion calls, run in about 5us.

The time from calling process(,) to the third dot in the case, noting that
the case is 21th in the total of 24 cases, is 200us. Nothing else in the
entire program takes this much time.

How is this implemented in C? I don’t know the generated C well, so I am
hoping an expert can explain. I would have thought some kind of jump table,
but even 21 conditionals should not take 200us. Calls to read_hex() are
probably just as many instructions as 21 conditionals.

If this is a known inefficiency, is there a better way to code a big case
like this?

FYI, someone will probably say yuck to the cast. The int comes from the
outside world and conceptually the code is cleaner as characters for
reading. But I am open to better approaches.

Mike

fun process (c:int, s:state) : state = let
.
val ch = cast{char}©
.
val state = case 0 of
.
| _ when ch = ‘a’ => let
val a = read_hex()
in @{ addr = a, ps = idle } end

process(c, s)

Oh, nice.

And can I specify more than one transition such that all the legal paths
between states are tracked at the level of types?

That would be super cool, as I can put these in one place, easy to see, and
the implementation which might be long and verbose can’t get into trouble.On Sunday, September 27, 2015 at 11:07:46 AM UTC-6, gmhwxi wrote:

s: &state >> _

means:

s: &state >> state

Imagine a case where ‘state’ is indexed by integer; then you could
write:

s: &state(n) >> state(n+1)

Or you could have something like

s: &state(Idle) >> state(Start) // the state changes from Idle to Start
after this call

if you want to track a finite state machine at the level of types.

On Sunday, September 27, 2015 at 12:43:00 PM UTC-4, Mike Jones wrote:

This looks like a good optimization if I need some more performance.

What does the “>> _” mean?

On Saturday, September 26, 2015 at 6:35:30 PM UTC-6, gmhwxi wrote:

Could you include the entire code for ‘process’?

On the surface, the C compiler may like the following style of coding
better:

fun proces(c: int, s: &state >> _): void =
val ch = cast{char}(c)
in
case 0 of
| _ when ch = ‘a’ => let
val a = read_hex()
in
s.addr := a; s.ps = idle
end
| …
end

On Saturday, September 26, 2015 at 7:57:52 PM UTC-4, Mike Jones wrote:

I am trying to understand the performance of my code running on an
Arduino Uno. I optimize -O2, and by pulsing digital pins and using a scope,
I can measure where times goes. Other than interrupts from the UART, the
measurements are pretty consistant.

I ran into some code that is a magnitude slower compared to other code.
I have put three dots where I measure, in the following code.

For reference, calls to things like read_hex() which read two chars
from the UART and make conversion calls, run in about 5us.

The time from calling process(,) to the third dot in the case, noting
that the case is 21th in the total of 24 cases, is 200us. Nothing else in
the entire program takes this much time.

How is this implemented in C? I don’t know the generated C well, so I
am hoping an expert can explain. I would have thought some kind of jump
table, but even 21 conditionals should not take 200us. Calls to read_hex()
are probably just as many instructions as 21 conditionals.

If this is a known inefficiency, is there a better way to code a big
case like this?

FYI, someone will probably say yuck to the cast. The int comes from the
outside world and conceptually the code is cleaner as characters for
reading. But I am open to better approaches.

Mike

fun process (c:int, s:state) : state = let
.
val ch = cast{char}(c)
.
val state = case 0 of
.
| _ when ch = ‘a’ => let
val a = read_hex()
in @{ addr = a, ps = idle } end

process(c, s)

This looks like a good optimization if I need some more performance.

What does the “>> _” mean?On Saturday, September 26, 2015 at 6:35:30 PM UTC-6, gmhwxi wrote:

Could you include the entire code for ‘process’?

On the surface, the C compiler may like the following style of coding
better:

fun proces(c: int, s: &state >> _): void =
val ch = cast{char}(c)
in
case 0 of
| _ when ch = ‘a’ => let
val a = read_hex()
in
s.addr := a; s.ps = idle
end
| …
end

On Saturday, September 26, 2015 at 7:57:52 PM UTC-4, Mike Jones wrote:

I am trying to understand the performance of my code running on an
Arduino Uno. I optimize -O2, and by pulsing digital pins and using a scope,
I can measure where times goes. Other than interrupts from the UART, the
measurements are pretty consistant.

I ran into some code that is a magnitude slower compared to other code. I
have put three dots where I measure, in the following code.

For reference, calls to things like read_hex() which read two chars from
the UART and make conversion calls, run in about 5us.

The time from calling process(,) to the third dot in the case, noting
that the case is 21th in the total of 24 cases, is 200us. Nothing else in
the entire program takes this much time.

How is this implemented in C? I don’t know the generated C well, so I am
hoping an expert can explain. I would have thought some kind of jump table,
but even 21 conditionals should not take 200us. Calls to read_hex() are
probably just as many instructions as 21 conditionals.

If this is a known inefficiency, is there a better way to code a big case
like this?

FYI, someone will probably say yuck to the cast. The int comes from the
outside world and conceptually the code is cleaner as characters for
reading. But I am open to better approaches.

Mike

fun process (c:int, s:state) : state = let
.
val ch = cast{char}(c)
.
val state = case 0 of
.
| _ when ch = ‘a’ => let
val a = read_hex()
in @{ addr = a, ps = idle } end

process(c, s)

Yes, you can. Here is a sketch showing a way to do this:

https://github.com/githwxi/ATS-Postiats-test/blob/master/contrib/hwxi/TEST10/test15.datsOn Sunday, September 27, 2015 at 5:58:24 PM UTC-4, Mike Jones wrote:

Oh, nice.

And can I specify more than one transition such that all the legal paths
between states are tracked at the level of types?

That would be super cool, as I can put these in one place, easy to see,
and the implementation which might be long and verbose can’t get into
trouble.

On Sunday, September 27, 2015 at 11:07:46 AM UTC-6, gmhwxi wrote:

s: &state >> _

means:

s: &state >> state

Imagine a case where ‘state’ is indexed by integer; then you could
write:

s: &state(n) >> state(n+1)

Or you could have something like

s: &state(Idle) >> state(Start) // the state changes from Idle to Start
after this call

if you want to track a finite state machine at the level of types.

On Sunday, September 27, 2015 at 12:43:00 PM UTC-4, Mike Jones wrote:

This looks like a good optimization if I need some more performance.

What does the “>> _” mean?

On Saturday, September 26, 2015 at 6:35:30 PM UTC-6, gmhwxi wrote:

Could you include the entire code for ‘process’?

On the surface, the C compiler may like the following style of coding
better:

fun proces(c: int, s: &state >> _): void =
val ch = cast{char}(c)
in
case 0 of
| _ when ch = ‘a’ => let
val a = read_hex()
in
s.addr := a; s.ps = idle
end
| …
end

On Saturday, September 26, 2015 at 7:57:52 PM UTC-4, Mike Jones wrote:

I am trying to understand the performance of my code running on an
Arduino Uno. I optimize -O2, and by pulsing digital pins and using a scope,
I can measure where times goes. Other than interrupts from the UART, the
measurements are pretty consistant.

I ran into some code that is a magnitude slower compared to other
code. I have put three dots where I measure, in the following code.

For reference, calls to things like read_hex() which read two chars
from the UART and make conversion calls, run in about 5us.

The time from calling process(,) to the third dot in the case, noting
that the case is 21th in the total of 24 cases, is 200us. Nothing else in
the entire program takes this much time.

How is this implemented in C? I don’t know the generated C well, so I
am hoping an expert can explain. I would have thought some kind of jump
table, but even 21 conditionals should not take 200us. Calls to read_hex()
are probably just as many instructions as 21 conditionals.

If this is a known inefficiency, is there a better way to code a big
case like this?

FYI, someone will probably say yuck to the cast. The int comes from
the outside world and conceptually the code is cleaner as characters for
reading. But I am open to better approaches.

Mike

fun process (c:int, s:state) : state = let
.
val ch = cast{char}(c)
.
val state = case 0 of
.
| _ when ch = ‘a’ => let
val a = read_hex()
in @{ addr = a, ps = idle } end

process(c, s)

s: &state >> _

means:

s: &state >> state

Imagine a case where ‘state’ is indexed by integer; then you could
write:

s: &state(n) >> state(n+1)

Or you could have something like

s: &state(Idle) >> state(Start) // the state changes from Idle to Start
after this call

if you want to track a finite state machine at the level of types.On Sunday, September 27, 2015 at 12:43:00 PM UTC-4, Mike Jones wrote:

This looks like a good optimization if I need some more performance.

What does the “>> _” mean?

On Saturday, September 26, 2015 at 6:35:30 PM UTC-6, gmhwxi wrote:

Could you include the entire code for ‘process’?

On the surface, the C compiler may like the following style of coding
better:

fun proces(c: int, s: &state >> _): void =
val ch = cast{char}(c)
in
case 0 of
| _ when ch = ‘a’ => let
val a = read_hex()
in
s.addr := a; s.ps = idle
end
| …
end

On Saturday, September 26, 2015 at 7:57:52 PM UTC-4, Mike Jones wrote:

I am trying to understand the performance of my code running on an
Arduino Uno. I optimize -O2, and by pulsing digital pins and using a scope,
I can measure where times goes. Other than interrupts from the UART, the
measurements are pretty consistant.

I ran into some code that is a magnitude slower compared to other code.
I have put three dots where I measure, in the following code.

For reference, calls to things like read_hex() which read two chars from
the UART and make conversion calls, run in about 5us.

The time from calling process(,) to the third dot in the case, noting
that the case is 21th in the total of 24 cases, is 200us. Nothing else in
the entire program takes this much time.

How is this implemented in C? I don’t know the generated C well, so I am
hoping an expert can explain. I would have thought some kind of jump table,
but even 21 conditionals should not take 200us. Calls to read_hex() are
probably just as many instructions as 21 conditionals.

If this is a known inefficiency, is there a better way to code a big
case like this?

FYI, someone will probably say yuck to the cast. The int comes from the
outside world and conceptually the code is cleaner as characters for
reading. But I am open to better approaches.

Mike

fun process (c:int, s:state) : state = let
.
val ch = cast{char}(c)
.
val state = case 0 of
.
| _ when ch = ‘a’ => let
val a = read_hex()
in @{ addr = a, ps = idle } end

process(c, s)

I tried and failed. I got a compile error about over constraining a type, which I think means my type is not indexed. I am using an Enumerative Datatype as state, with value constructors Idle, Start, Working, etc. is it possible to index it or do I have to index an int and some how create names for each int with some function like

make_state (int) = State(int)

With some constraints on the int to one value?

My intent was to possibly carry a value in the datatype so the state carried extra information with it. So that would mean trying not to use a constrained int.

You may need to first read Part IV of the following book:

http://ats-lang.github.io/DOCUMENT/INT2PROGINATS/HTML/HTMLTOC/book1.html

The simplest way I could come up with is based on abstract views; it is
given in the
following example:

Here is the idea:

To implement Blink, let us introduce 3 states: setup, light_on and light_off

Calling setup gets you to the ‘setup’ state. The transition functions are
listed below:

fun light_on1 : setup → light_on
fun light_on2 : light_off → light_on
fun light_off : light_on → light_off

Essentially, these functions correspond to certain edges in a state diagram
depicting
the ‘Blink’ system.On Sunday, September 27, 2015 at 7:44:46 PM UTC-4, Mike Jones wrote:

I tried and failed. I got a compile error about over constraining a type,
which I think means my type is not indexed. I am using an Enumerative
Datatype as state, with value constructors Idle, Start, Working, etc. is it
possible to index it or do I have to index an int and some how create names
for each int with some function like

make_state (int) = State(int)

With some constraints on the int to one value?

My intent was to possibly carry a value in the datatype so the state
carried extra information with it. So that would mean trying not to use a
constrained int.

I’m working though that chapter. Might be awhile before I get back to this
question. I need do think though some structural things.

ThanksOn Sunday, September 27, 2015 at 6:47:19 PM UTC-6, gmhwxi wrote:

You may need to first read Part IV of the following book:

Introduction to Programming in ATS

The simplest way I could come up with is based on abstract views; it is
given in the
following example:

https://github.com/githwxi/ATS-Postiats-test/blob/master/contrib/hwxi/TEST10/test15.dats

Here is the idea:

To implement Blink, let us introduce 3 states: setup, light_on and
light_off

Calling setup gets you to the ‘setup’ state. The transition functions are
listed below:

fun light_on1 : setup → light_on
fun light_on2 : light_off → light_on
fun light_off : light_on → light_off

Essentially, these functions correspond to certain edges in a state
diagram depicting
the ‘Blink’ system.

On Sunday, September 27, 2015 at 7:44:46 PM UTC-4, Mike Jones wrote:

I tried and failed. I got a compile error about over constraining a type,
which I think means my type is not indexed. I am using an Enumerative
Datatype as state, with value constructors Idle, Start, Working, etc. is it
possible to index it or do I have to index an int and some how create names
for each int with some function like

make_state (int) = State(int)

With some constraints on the int to one value?

My intent was to possibly carry a value in the datatype so the state
carried extra information with it. So that would mean trying not to use a
constrained int.

I tried and failed on indexing. Error message saying I over constrained a type. I think that means the type is not indexed.

In my case, I am using an Enumerative Datatype, and the value constructors are Idle, Working, etc. how do I index these?

Mike

Could you include the entire code for ‘process’?

On the surface, the C compiler may like the following style of coding
better:

fun proces(c: int, s: &state >> _): void =
val ch = cast{char}(c)
in
case 0 of
| _ when ch = ‘a’ => let
val a = read_hex()
in
s.addr := a; s.ps = idle
end
| …
endOn Saturday, September 26, 2015 at 7:57:52 PM UTC-4, Mike Jones wrote:

I am trying to understand the performance of my code running on an Arduino
Uno. I optimize -O2, and by pulsing digital pins and using a scope, I can
measure where times goes. Other than interrupts from the UART, the
measurements are pretty consistant.

I ran into some code that is a magnitude slower compared to other code. I
have put three dots where I measure, in the following code.

For reference, calls to things like read_hex() which read two chars from
the UART and make conversion calls, run in about 5us.

The time from calling process(,) to the third dot in the case, noting that
the case is 21th in the total of 24 cases, is 200us. Nothing else in the
entire program takes this much time.

How is this implemented in C? I don’t know the generated C well, so I am
hoping an expert can explain. I would have thought some kind of jump table,
but even 21 conditionals should not take 200us. Calls to read_hex() are
probably just as many instructions as 21 conditionals.

If this is a known inefficiency, is there a better way to code a big case
like this?

FYI, someone will probably say yuck to the cast. The int comes from the
outside world and conceptually the code is cleaner as characters for
reading. But I am open to better approaches.

Mike

fun process (c:int, s:state) : state = let
.
val ch = cast{char}(c)
.
val state = case 0 of
.
| _ when ch = ‘a’ => let
val a = read_hex()
in @{ addr = a, ps = idle } end

process(c, s)

‘case’ in ‘process’ is compiled into a sequence of if-then-else expressions.
It does not seem to be the cause for the slowdown. Note that ‘state’ is a
flat
struct. A C compiler may not do a good job optimizing code that returns a
flat struct as the coding style is a bit uncommon in C.On Saturday, September 26, 2015 at 8:35:30 PM UTC-4, gmhwxi wrote:

Could you include the entire code for ‘process’?

On the surface, the C compiler may like the following style of coding
better:

fun proces(c: int, s: &state >> _): void =
val ch = cast{char}(c)
in
case 0 of
| _ when ch = ‘a’ => let
val a = read_hex()
in
s.addr := a; s.ps = idle
end
| …
end

On Saturday, September 26, 2015 at 7:57:52 PM UTC-4, Mike Jones wrote:

I am trying to understand the performance of my code running on an
Arduino Uno. I optimize -O2, and by pulsing digital pins and using a scope,
I can measure where times goes. Other than interrupts from the UART, the
measurements are pretty consistant.

I ran into some code that is a magnitude slower compared to other code. I
have put three dots where I measure, in the following code.

For reference, calls to things like read_hex() which read two chars from
the UART and make conversion calls, run in about 5us.

The time from calling process(,) to the third dot in the case, noting
that the case is 21th in the total of 24 cases, is 200us. Nothing else in
the entire program takes this much time.

How is this implemented in C? I don’t know the generated C well, so I am
hoping an expert can explain. I would have thought some kind of jump table,
but even 21 conditionals should not take 200us. Calls to read_hex() are
probably just as many instructions as 21 conditionals.

If this is a known inefficiency, is there a better way to code a big case
like this?

FYI, someone will probably say yuck to the cast. The int comes from the
outside world and conceptually the code is cleaner as characters for
reading. But I am open to better approaches.

Mike

fun process (c:int, s:state) : state = let
.
val ch = cast{char}(c)
.
val state = case 0 of
.
| _ when ch = ‘a’ => let
val a = read_hex()
in @{ addr = a, ps = idle } end

process(c, s)