Regular expressions progress

Looks like I’m going to wind up using the Plan 9 regexp library after all. After reading some very informative articles by Russ Cox (this, this, and this), I think I now understand the internal workings of the Plan 9 regexp library well enough to be able to modify it to first of all work on Arcueid strings, and then add in some of the functionality to make it more useful, though it will still not be entirely compatible with Perl or POSIX regexps.  In particular, I don’t think I ever want to make Arcueid regexps support backreferences, which can only be implemented by the backtracking algorithm. However, I do hope to add in enough support for Perl/POSIX regexp features to make it sufficiently useful for most workaday uses of regexps.

I’ve also changed regexp syntax slightly. A regexp in Arcueid now looks sorta like r/…/. It seems that there’s no easy way to unambiguously parse a regexp in the Perl-like /…/ syntax and at the same time have operators like / for division. At least the r in front makes it unambiguous.

While at the moment Arcueid’s strings are simple arrays of Unicode runes (UCS-4), I have plans to make them something much more sophisticated, by turning them into Ropes. This should also permit much more efficient use on strings of the operations on conses like car and cdr, which is something that Paul Graham has considered.

~ by stormwyrm on 2013-05-09.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

 
%d bloggers like this: