ATS project high level design

I want to create an open source lightweight, fast and lazy in memory xml
document store to support high level queries:

  1. I plan to use pugixml (http://pugixml.org/) to parse xml documents into
    memory. The parsing is in-situ without string copying. All nodes are
    wrapped references to substrings.
  2. Queries return references to nodes or nodesets. Pugixml already supports
    XPath but the library will also support queries over multiple documents,
    These can be cached by the library so that the higher level user of the
    library can be functional or declarative without worrying about
    memoization. I am thinking of doing memoization using C-Memo (
    http://sourceforge.net/projects/c-memo/).
  3. The library only needs to provide non-granular memory management. The
    documents are processed in sets. Multiple sets can be loaded and queried,
    Sets will be released explicitly. When a set is released all the associated
    DOMs will be released and all the references will be invalidated. Many sets
    of documents can be loaded and unloaded by the user so while granularity is
    not important, it is a requirement to no leak memory significantly over
    long sessions.
  4. I want it to be easy to integrate with Prolog (or other high level
    languages like OCaml), meaning we want to play nice with host language data
    structures and its GC.

I would like to use ATS for this. (The alternative is likely C++). As I am
new to ATS I am still trying to develop a mental picture of how to best
take advantage of ATS’s strength. Since memory management of the lower
level (xml parsing, querying and caching) is non-granular it should be
fairly straightforward – that is hopefully I don’t have to deal with the
heavy cannons of ATS like linear types in the beginning. ATS pattern
matching seems the ideal mechanism to bridge the semantics of the higher
level API but I don’t want it to leak too much without a GC.

Haitao

On Sunday, January 12, 2014 8:56:19 AM UTC-8, gmhwxi wrote:

This all depends how Prolog creates/manipulates heap-allocated values.
If a library for Prolog can be implemented in ATS in ML-style, then it
should already be possible
to do it in ML. So I have doubts about the viability of such an approach.

However, one possiblity is that you first implement in ML-style but use a
separate heap to store
ATS values; this will create memory-leaks; then you employ linear types to
eliminate these leaks.

If you have a concrete case, I will be happy to take a look.

This sounds like a project that can really take advantage of ATS.

Basically, you can use ATS as some sort of front-end to generate
C code.

You should start with datatypes (instead of dataviewtypes)
and do not worry about memory-leaks. It is fairly straightforward
to replace datatypes with dataviewtypes later to stop memory leaks.
This is one big advantage of using ATS.

Here is a concrete example showing how leaks can be eliminated:

http://www.ats-lang.org/EXAMPLE/EFFECTIVATS/word-counting/

If possible, malloc/free for ATS code should be independent of the GC for
the host language. Also, try to abstract types in ATS to handle host
language data.

Have fun!On Saturday, January 18, 2014 1:44:50 AM UTC-5, H Zhang wrote:

I want to create an open source lightweight, fast and lazy in memory xml
document store to support high level queries:

  1. I plan to use pugixml (http://pugixml.org/) to parse xml documents
    into memory. The parsing is in-situ without string copying. All nodes are
    wrapped references to substrings.
  2. Queries return references to nodes or nodesets. Pugixml already
    supports XPath but the library will also support queries over multiple
    documents, These can be cached by the library so that the higher level user
    of the library can be functional or declarative without worrying about
    memoization. I am thinking of doing memoization using C-Memo (
    C-Memo download | SourceForge.net).
  3. The library only needs to provide non-granular memory management. The
    documents are processed in sets. Multiple sets can be loaded and queried,
    Sets will be released explicitly. When a set is released all the associated
    DOMs will be released and all the references will be invalidated. Many sets
    of documents can be loaded and unloaded by the user so while granularity is
    not important, it is a requirement to no leak memory significantly over
    long sessions.
  4. I want it to be easy to integrate with Prolog (or other high level
    languages like OCaml), meaning we want to play nice with host language data
    structures and its GC.

I would like to use ATS for this. (The alternative is likely C++). As I am
new to ATS I am still trying to develop a mental picture of how to best
take advantage of ATS’s strength. Since memory management of the lower
level (xml parsing, querying and caching) is non-granular it should be
fairly straightforward – that is hopefully I don’t have to deal with the
heavy cannons of ATS like linear types in the beginning. ATS pattern
matching seems the ideal mechanism to bridge the semantics of the higher
level API but I don’t want it to leak too much without a GC.

Haitao

On Sunday, January 12, 2014 8:56:19 AM UTC-8, gmhwxi wrote:

This all depends how Prolog creates/manipulates heap-allocated values.
If a library for Prolog can be implemented in ATS in ML-style, then it
should already be possible
to do it in ML. So I have doubts about the viability of such an approach.

However, one possiblity is that you first implement in ML-style but use a
separate heap to store
ATS values; this will create memory-leaks; then you employ linear types
to eliminate these leaks.

If you have a concrete case, I will be happy to take a look.