I want to create an open source lightweight, fast and lazy in memory xml
document store to support high level queries:
I plan to use pugixml (http://pugixml.org/) to parse xml documents into
memory. The parsing is in-situ without string copying. All nodes are
wrapped references to substrings.
Queries return references to nodes or nodesets. Pugixml already supports
XPath but the library will also support queries over multiple documents,
These can be cached by the library so that the higher level user of the
library can be functional or declarative without worrying about
memoization. I am thinking of doing memoization using C-Memo ( http://sourceforge.net/projects/c-memo/).
The library only needs to provide non-granular memory management. The
documents are processed in sets. Multiple sets can be loaded and queried,
Sets will be released explicitly. When a set is released all the associated
DOMs will be released and all the references will be invalidated. Many sets
of documents can be loaded and unloaded by the user so while granularity is
not important, it is a requirement to no leak memory significantly over
long sessions.
I want it to be easy to integrate with Prolog (or other high level
languages like OCaml), meaning we want to play nice with host language data
structures and its GC.
I would like to use ATS for this. (The alternative is likely C++). As I am
new to ATS I am still trying to develop a mental picture of how to best
take advantage of ATS’s strength. Since memory management of the lower
level (xml parsing, querying and caching) is non-granular it should be
fairly straightforward – that is hopefully I don’t have to deal with the
heavy cannons of ATS like linear types in the beginning. ATS pattern
matching seems the ideal mechanism to bridge the semantics of the higher
level API but I don’t want it to leak too much without a GC.
Haitao
On Sunday, January 12, 2014 8:56:19 AM UTC-8, gmhwxi wrote:
This all depends how Prolog creates/manipulates heap-allocated values.
If a library for Prolog can be implemented in ATS in ML-style, then it
should already be possible
to do it in ML. So I have doubts about the viability of such an approach.
However, one possiblity is that you first implement in ML-style but use a
separate heap to store
ATS values; this will create memory-leaks; then you employ linear types to
eliminate these leaks.
If you have a concrete case, I will be happy to take a look.
This sounds like a project that can really take advantage of ATS.
Basically, you can use ATS as some sort of front-end to generate
C code.
You should start with datatypes (instead of dataviewtypes)
and do not worry about memory-leaks. It is fairly straightforward
to replace datatypes with dataviewtypes later to stop memory leaks.
This is one big advantage of using ATS.
Here is a concrete example showing how leaks can be eliminated:
If possible, malloc/free for ATS code should be independent of the GC for
the host language. Also, try to abstract types in ATS to handle host
language data.
Have fun!On Saturday, January 18, 2014 1:44:50 AM UTC-5, H Zhang wrote:
I want to create an open source lightweight, fast and lazy in memory xml
document store to support high level queries:
I plan to use pugixml (http://pugixml.org/) to parse xml documents
into memory. The parsing is in-situ without string copying. All nodes are
wrapped references to substrings.
Queries return references to nodes or nodesets. Pugixml already
supports XPath but the library will also support queries over multiple
documents, These can be cached by the library so that the higher level user
of the library can be functional or declarative without worrying about
memoization. I am thinking of doing memoization using C-Memo ( C-Memo download | SourceForge.net).
The library only needs to provide non-granular memory management. The
documents are processed in sets. Multiple sets can be loaded and queried,
Sets will be released explicitly. When a set is released all the associated
DOMs will be released and all the references will be invalidated. Many sets
of documents can be loaded and unloaded by the user so while granularity is
not important, it is a requirement to no leak memory significantly over
long sessions.
I want it to be easy to integrate with Prolog (or other high level
languages like OCaml), meaning we want to play nice with host language data
structures and its GC.
I would like to use ATS for this. (The alternative is likely C++). As I am
new to ATS I am still trying to develop a mental picture of how to best
take advantage of ATS’s strength. Since memory management of the lower
level (xml parsing, querying and caching) is non-granular it should be
fairly straightforward – that is hopefully I don’t have to deal with the
heavy cannons of ATS like linear types in the beginning. ATS pattern
matching seems the ideal mechanism to bridge the semantics of the higher
level API but I don’t want it to leak too much without a GC.
Haitao
On Sunday, January 12, 2014 8:56:19 AM UTC-8, gmhwxi wrote:
This all depends how Prolog creates/manipulates heap-allocated values.
If a library for Prolog can be implemented in ATS in ML-style, then it
should already be possible
to do it in ML. So I have doubts about the viability of such an approach.
However, one possiblity is that you first implement in ML-style but use a
separate heap to store
ATS values; this will create memory-leaks; then you employ linear types
to eliminate these leaks.
If you have a concrete case, I will be happy to take a look.