Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I use tesseract.js for detecting numbers in Node JS. For example this is my image :

enter image description here

I run my script and it detects something like this:

289 ,0

And due to noises in the image, it considers space, other signs like comma and etc.

Is there anyway I can specify just numbers and no others signs like space and commas?

Also this is my code:

tesseract.recognize(
    __dirname + '/Captcha.png',
    'eng',
    { logger: m => console.log(m) }
).then(({ data: { text } }) => {
    console.log(text);
});

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
3.0k views
Welcome To Ask or Share your Answers For Others

1 Answer

I don't no the js tesseract API, however it seems that there is a quite simple work-around here by filter afterward:

tesseract.recognize(
    __dirname + '/Captcha.png',
    'eng',
    { logger: m => console.log(m) }
).then(({ data: { text } }) => {
    const filteredText = Array.from(text.matchAll(/d/g)).join("")
    console.log(filteredText)
})

Here's the test for just the filtering function:

if (Array.from("209, 1".matchAll(/d/g)).join("") !== "2091") {
  throw("Not working")
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...