Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

It is said that Java is 10x faster than python in terms of performance. That's what I see from benchmarks too. But what really brings down Java is the JVM startup time.

This is a test I made:

$time xlsx2csv.py Types of ESI v2.doc-emb-Package-9
...
<output skipped>
real    0m0.085s
user    0m0.072s
sys     0m0.013s


$time java  -jar -client /usr/local/bin/tika-app-0.7.jar -m Types of ESI v2.doc-emb-Package-9

real    0m2.055s
user    0m2.433s
sys     0m0.078s

Same file , a 12 KB ms XLSX embedded file inside Docx and Python is 25x faster !! WTH!!

It takes 2.055 sec for Java.

I know it is all due to startup time, but what i need is i need to call it via a script to parse some documents which i do not want to re-invent the wheel in python.

But as to parse 10k+ files , it is just not practical..

Anyway to speed it up (I already tried -client option and it only speed up by so little(20%) ).

My another idea? Run it as a long-running daemon , communicate using UDP or Linux-ICP sockets locally?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
187 views
Welcome To Ask or Share your Answers For Others

1 Answer

Try Nailgun.

Note: I don't use it personally.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...