Member-only story
Node Stream tutorial with Highland JS
Efficiently process huge amounts of data with Node Stream
I encountered Node Stream at my day job, and due to my unfamiliarity with the concept, I had a hard time understanding what it is and how it works. This article explains my approach to learning the basics of Node Stream with Highland JS.
What is Stream Processing
Stream processing is a concept where you start to process the data as soon as they become available and process them in chunks. Batch processing, instead, is where you accumulate the data for a certain period and process all of them at once.
Both of these techniques are used for processing a large amount of data. An example of a data processing application is building a leaderboard that shows the rank of players in an online game. In batch processing, you can sum up the player’s score every hour and update the scoreboard in the background. In stream processing, you can sum up the player’s score as soon as any player finishes their game and update the scoreboard immediately.
There are pros and cons for each type of data processing, but today, we’ll only focus on stream processing. One significant advantage of stream processing is its memory efficiency because it allows you to work with a dataset too big to fit in local memory. You can read this great article from freeCodeCamp .
Node Stream
Node stream is an abstract interface for stream processing in NodeJS. I want to avoid going into the detail of this topic because I want to share specifically about Highland JS.
One pitfall I faced with using Node Stream for data processing is it’s impossible to resume from where you left off when the application crash or SIGTERM
unless you implement some checkpointing mechanism. Otherwise, you must start from the beginning whenever your application/script stops.
Highland JS
Highland JS is a utility library that helps you write stream processing in NodeJS quickly. Highland JS has many features, including handling backpressure , which is essential for writing an efficient application.