# html-tokenize **Repository Path**: mirrors_regular/html-tokenize ## Basic Information - **Project Name**: html-tokenize - **Description**: transform stream to tokenize html - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-09-25 - **Last Updated**: 2026-05-10 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # html-tokenize transform stream to tokenize html [![build status](https://secure.travis-ci.org/substack/html-tokenize.png)](http://travis-ci.org/substack/html-tokenize) # example ``` js var fs = require('fs'); var tokenize = require('html-tokenize'); var through = require('through2'); fs.createReadStream(__dirname + '/table.html') .pipe(tokenize()) .pipe(through.obj(function (row, enc, next) { row[1] = row[1].toString(); console.log(row); next(); })) ; ``` this html: ``` html blah blah blah
there
it
is
``` generates this output: ``` [ 'open', '' ] [ 'text', '\n ' ] [ 'open', '' ] [ 'text', 'blah blah blah' ] [ 'close', '' ] [ 'text', '\n ' ] [ 'open', '' ] [ 'open', '' ] [ 'close', '' ] [ 'text', '\n ' ] [ 'open', '' ] [ 'open', '' ] [ 'close', '' ] [ 'text', '\n ' ] [ 'open', '' ] [ 'open', '' ] [ 'close', '' ] [ 'text', '\n' ] [ 'close', '
' ] [ 'text', 'there' ] [ 'close', '
' ] [ 'text', 'it' ] [ 'close', '
' ] [ 'text', 'is' ] [ 'close', '
' ] [ 'text', '\n' ] ``` # methods ``` js var tokenize = require('html-tokenize'); ``` ## var t = tokenize() Return a tokenize transform stream `t` that takes html input and produces rows of output. The output rows are of the form: * `[ name, buffer ]` The input stream maps completely onto the buffers from the object stream. The types of names are: * open * close * text cdata, comments, and scripts all use `'open'` with their contents appearing in subsequent `'text'` rows. # usage There is an html-tokenize command too. ``` usage: html-tokenize {FILE} Tokenize FILE into newline-separated json arrays for each tag. If FILE is not specified, use stdin. ``` # install With [npm](https://npmjs.org), to get the library do: ``` npm install html-tokenize ``` or to get the command do: ``` npm install -g html-tokenize ``` # license MIT