Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am writing a small program to process a big text file and do some replacements. The thing is that it never stops allocating new memory, so in the end it runs out of memory. I have reduced it to a simple program that simply counts the number of lines (see the code below) while still allocating more and more memory. I must admit that I know little about boost and boost spirit in particular. Could you please tell me what I am doing wrong? Thanks a million!

#include <string>
#include <iostream>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/bind.hpp>
#include <boost/ref.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>

// Token ids
enum token_ids {
    ID_EOL= 100
};

// Token definition
template <typename Lexer>
    struct var_replace_tokens : boost::spirit::lex::lexer<Lexer> {
        var_replace_tokens() {
            this->self.add ("
", ID_EOL); // newline characters
        }
    };

// Functor
struct replacer {
    typedef bool result_type;
    template <typename Token>
    bool operator()(Token const& t, std::size_t& lines) const  {
        switch (t.id()) {
        case ID_EOL:
            lines++;
            break;  
        }
        return true;
    }
}; 

int main(int argc, char **argv) {
    size_t lines=0;

    var_replace_tokens< boost::spirit::lex::lexertl::lexer< boost::spirit::lex::lexertl::token< boost::spirit::istream_iterator> > > var_replace_functor;

    cin.unsetf(std::ios::skipws);

    boost::spirit::istream_iterator first(cin);
    boost::spirit::istream_iterator last;

    bool r = boost::spirit::lex::tokenize(first, last, var_replace_functor,  boost::bind(replacer(), _1, boost::ref(lines)));

    if (r) {
        cerr<<"Lines processed: "<<lines<<endl;
    }  else {
        string rest(first, last);
        cerr << "Processing failed at: "<<rest<<" (line "<<lines<<")"<<endl;
    }
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
218 views
Welcome To Ask or Share your Answers For Others

1 Answer

The behaviour is by design.

  • Me: It must be the multi_pass iterator adaptor. Since there is no grammar Spirit doesn't know when it can be flushed. [...]

  • You: As fas as I know, istream_iterator takes care of reading the input stream without having to store the whole stream into memory

Yes. But you're not using std::istream_iterator. You're using Boost Spirit. Which is a parser generator. Parsers need random access for backtracking.

Spirit supports input iterators by adapting an input sequence to a random-access sequence with the multi_pass adaptor. This iterator adaptor stores a variable-size buffer1 for backtracking purposes. Certain actions (expectation points, always-greedy operators like Kleene-* etc) tell the parser framework when it's safe to flush the buffer.

The Problem:

You're not parsing, just tokenizing. Nothing ever tells the iterator to flush its buffers.

The buffer is unbounded, so memory usage grows. Of course it's not a leak because as soon as the last copy of a multi-pass adapted iterator goes out of scope, the shared backtracking buffer is freed.

The Solution:

The simplest solution is to use a random access source. If you can, use a memory mapped file.

Other solutions would involve telling the multi-pass adaptor to flush. The simplest way to achieve this would be to use tokenize_and_parse. Even with a faux grammar like *(any_token) this should be enough to convince the parser framework you will not be asking it to backtrack.

Inspiration:


1 http://www.boost.org/doc/libs/1_62_0/libs/spirit/doc/html/spirit/support/multi_pass.html by default it stores a shared deque. See it after running your test for a little while using dd if=/dev/zero bs=1M | valgrind --tool=massif ./sotest:

enter image description here

Clearly shows all the memory in

100.00% (805,385,576B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->99.99% (805,306,368B) 0x4187D5: void boost::spirit::iterator_policies::split_std_deque::unique<char>::increment<boost::spirit::multi_pass<std::istream, boost::spirit::iterator_policies::default_policy<boost::spirit::iterator_policies::ref_counted, boost::spirit::iterator_policies::no_check, boost::spirit::iterator_policies::istream, boost::spirit::iterator_policies::split_std_deque> > >(boost::spirit::multi_pass<std::istream, boost::spirit::iterator_policies::default_policy<boost::spirit::iterator_policies::ref_counted, boost::spirit::iterator_policies::no_check, boost::spirit::iterator_policies::istream, boost::spirit::iterator_policies::split_std_deque> >&) (in /home/sehe/Projects/stackoverflow/sotest)
| ->99.99% (805,306,368B) 0x404BC3: main (in /home/sehe/Projects/stackoverflow/sotest)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...