How could I add regex (regular expression) support to ParseJS using my current method of finding tokens?

So, ParseJS is a tokenization library I made for JavaScript.
It’s core feature is to sort a string into an array of characters (strings whose length is strictly limited to one (1)) and symbols, where each symbol is a stand-in for a token. (Tokens are s…


This content originally appeared on DEV Community and was authored by Calin Baenen

So, ParseJS is a tokenization library I made for JavaScript.
It's core feature is to sort a string into an array of characters (strings whose length is strictly limited to one (1)) and symbols, where each symbol is a stand-in for a token. (Tokens are symbols so you can easily tell between a token and a character.)

Anyways. Where am I going with this?
Well, ParseJS is good, but it's not great.

You can statically parse tokens, like so:

// Parameters: (str:string, toks:string[])
parse_string("test12 test1 test2 test", [
  "test",
  "test1",
  "test12",
  "test2"
]);

and it will reliably produce:

[
  Symbol.for(test12),
  ' ',
  Symbol.for(test1),
  ' ',
  Symbol.for(test2),
  ' ',
  Symbol.for(test)
]

BUT- there is no way of creating abstract groups of tokens (e.g. like how variable names can be practically anything, but the language doesn't name them for you).

  • What I have:
parse_string("class Test: end", [
  "class",
  ':',
  "end"
]);
// -> [Sym(class), ' ', 'T', 'e', ..., Sym(:), ...]
  • What I want:
parse_string("class Test: end", [
  "class",
  ':',
  "end",
  /[^0-9\W]\w+]/ // 'g' flag added automatically.
]);
// -> [Sym(class), ' ', Sym(Test), Sym(:), ' ', Sym(end)]

The goal:

  • Add regex support to allow abstract token groups to exist.

How I find tokens:

  • Loop through each string in toks and collect the first character of each string in epl.
  • Loop through each character of str as c, and if c is in epl, slice the next few characters ahead to see if a valid keyword exists.

The challenge(s):

  • Unlike strings, the length that a regex represents can be variable and would need to be computed.
  • The way I check for tokens is by seeing if the character of a keyword exists. - But, I can't exactly do that, since there's no subscript operator, or way to get the character (or potential characters) in a regex.
  • I slice the substring to test based on the length of the keywords that exist. But, since I can't get the length(s) that a regex could be, I can't compute how big of a substring I need to slice to test.


This content originally appeared on DEV Community and was authored by Calin Baenen


Print Share Comment Cite Upload Translate Updates
APA

Calin Baenen | Sciencx (2021-12-16T08:24:11+00:00) How could I add regex (regular expression) support to ParseJS using my current method of finding tokens?. Retrieved from https://www.scien.cx/2021/12/16/how-could-i-add-regex-regular-expression-support-to-parsejs-using-my-current-method-of-finding-tokens/

MLA
" » How could I add regex (regular expression) support to ParseJS using my current method of finding tokens?." Calin Baenen | Sciencx - Thursday December 16, 2021, https://www.scien.cx/2021/12/16/how-could-i-add-regex-regular-expression-support-to-parsejs-using-my-current-method-of-finding-tokens/
HARVARD
Calin Baenen | Sciencx Thursday December 16, 2021 » How could I add regex (regular expression) support to ParseJS using my current method of finding tokens?., viewed ,<https://www.scien.cx/2021/12/16/how-could-i-add-regex-regular-expression-support-to-parsejs-using-my-current-method-of-finding-tokens/>
VANCOUVER
Calin Baenen | Sciencx - » How could I add regex (regular expression) support to ParseJS using my current method of finding tokens?. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2021/12/16/how-could-i-add-regex-regular-expression-support-to-parsejs-using-my-current-method-of-finding-tokens/
CHICAGO
" » How could I add regex (regular expression) support to ParseJS using my current method of finding tokens?." Calin Baenen | Sciencx - Accessed . https://www.scien.cx/2021/12/16/how-could-i-add-regex-regular-expression-support-to-parsejs-using-my-current-method-of-finding-tokens/
IEEE
" » How could I add regex (regular expression) support to ParseJS using my current method of finding tokens?." Calin Baenen | Sciencx [Online]. Available: https://www.scien.cx/2021/12/16/how-could-i-add-regex-regular-expression-support-to-parsejs-using-my-current-method-of-finding-tokens/. [Accessed: ]
rf:citation
» How could I add regex (regular expression) support to ParseJS using my current method of finding tokens? | Calin Baenen | Sciencx | https://www.scien.cx/2021/12/16/how-could-i-add-regex-regular-expression-support-to-parsejs-using-my-current-method-of-finding-tokens/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.