Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Could you please help me with following issue.

Goal

Read file on client side (in browser via JS and HTML5 classes) line by line, without loading whole file to memory.

Scenario

I'm working on web page which should parse files on client side. Currently, I'm reading file as it described in this article.

HTML:

<input type="file" id="files" name="files[]" />

JavaScript:

$("#files").on('change', function(evt){
    // creating FileReader
    var reader = new FileReader();

    // assigning handler
    reader.onloadend = function(evt) {      
        lines = evt.target.result.split(/
?
/);

        lines.forEach(function (line) {
            parseLine(...);
        }); 
    };

    // getting File instance
    var file = evt.target.files[0];

    // start reading
    reader.readAsText(file);
}

The problem is that FileReader reads whole file at once, which causes crashed tab for big files (size >= 300 MB). Using reader.onprogress doesn't solve a problem, as it just increments a result till it will hit the limit.

Inventing a wheel

I've done some research in internet and have found no simple way to do this (there are bunch of articles describing this exact functionality but on server side for node.js).

As only way to solve it I see only following:

  1. Split file by chunks (via File.split(startByte, endByte) method)
  2. Find last new line character in that chunk ('/n')
  3. Read that chunk except part after last new line character and convert it to the string and split by lines
  4. Read next chunk starting from last new line character found on step 2

But I'll better use something already existing to avoid entropy growth.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
275 views
Welcome To Ask or Share your Answers For Others

1 Answer

Eventually I've created new line-by-line reader, which is totally different from previous one.

Features are:

  • Index-based access to File (sequential and random)
  • Optimized for repeat random reading (milestones with byte offset saved for lines already navigated in past), so after you've read all file once, accessing line 43422145 will be almost as fast as accessing line 12.
  • Searching in file: find next and find all.
  • Exact index, offset and length of matches, so you can easily highlight them

Check this jsFiddle for examples.

Usage:

// Initialization
var file; // HTML5 File object
var navigator = new FileNavigator(file);

// Read some amount of lines (best performance for sequential file reading)
navigator.readSomeLines(startingFromIndex, function (err, index, lines, eof, progress) { ... });

// Read exact amount of lines
navigator.readLines(startingFromIndex, count, function (err, index, lines, eof, progress) { ... });

// Find first from index
navigator.find(pattern, startingFromIndex, function (err, index, match) { ... });

// Find all matching lines
navigator.findAll(new RegExp(pattern), indexToStartWith, limitOfMatches, function (err, index, limitHit, results) { ... });

Performance is same to previous solution. You can measure it invoking 'Read' in jsFiddle.

GitHub: https://github.com/anpur/client-line-navigator/wiki


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...