Help - Search - Members - Calendar
Full Version: Regexp issues with W^
The Black Wyrm's Lair - Forums > Mod development resources & discussion > Modder's Workshop
The Bigg
QUOTE(WizWom @ Aug 5 2006, 02:02 PM) *

W^ works fine, Infinity Engine works nicely with it.
Like I don't know how to escape a character in RegExp?

Give me W^, I like the way it looks. And I'm all the more sure no one else will mess with my files :-)

Believe me, it can create problems with run-time created regexps. I'm in a hurry, so I'll give a more detailed problem later smile.gif
WizWom
QUOTE(The Bigg @ Aug 19 2006, 09:08 AM) *

QUOTE(WizWom @ Aug 5 2006, 02:02 PM) *

W^ works fine, Infinity Engine works nicely with it.
Like I don't know how to escape a character in RegExp?

Give me W^, I like the way it looks. And I'm all the more sure no one else will mess with my files :-)

Believe me, it can create problems with run-time created regexps. I'm in a hurry, so I'll give a more detailed problem later smile.gif

Getting into the WeiDU code to see what you mean - yech, CAML sucks. And it is annoying that the SRC is written in bare newline. All the more annoying since the main OS the game runs in uses CRNL

OK, in a fix of insanity, '^' as a lexical token is string concatenation; inside [ ], ^ seems to indicate 'not' (why did they not use '+' and '!' like everyone else? Merci! Idiot Gauls)

['0'-'9''A'-'Z''a'-'z''_']['0'-'9''A'-'Z''a'-'z''#''_''-''.']* is the description of a 'STRING' data item in WeiDU - which means W^xyz would not be seen as one. Frankly, I fail to see why this is defined in so limited a fashion, but there is is.

However, this is for WeiDU LABELS... not, by any means, a string. and a character inside of quotes (of any of the sorts weidu recognizes) won't be bothered one whit.

So, um, Bigg, unless I'm not reading the fricking source right, what is your beef with W^?
The Bigg
QUOTE(WizWom @ Aug 19 2006, 04:34 PM) *

['0'-'9''A'-'Z''a'-'z''_']['0'-'9''A'-'Z''a'-'z''#''_''-''.']* is the description of a 'STRING' data item in WeiDU - which means W^xyz would not be seen as one. Frankly, I fail to see why this is defined in so limited a fashion, but there is is.

Those are the valid tokens in BAF code. My problem with regexp-related characters is that at least two mods use code which contains
FILE_CONTAINS_REGEXP (somefile ~^%SOURCE_RES%$~), which means that, with w^something, it'd try to look for the invalid regexp ^w^something$, which contains an out of place beginning of string, which would cause the mod to fail to install. Since one of those mods are mine, I get bug reports for your modding choices.
WizWom
QUOTE(The Bigg @ Aug 24 2006, 10:47 AM) *

QUOTE(WizWom @ Aug 19 2006, 04:34 PM) *

['0'-'9''A'-'Z''a'-'z''_']['0'-'9''A'-'Z''a'-'z''#''_''-''.']* is the description of a 'STRING' data item in WeiDU - which means W^xyz would not be seen as one. Frankly, I fail to see why this is defined in so limited a fashion, but there is is.

Those are the valid tokens in BAF code. My problem with regexp-related characters is that at least two mods use code which contains
FILE_CONTAINS_REGEXP (somefile ~^%SOURCE_RES%$~), which means that, with w^something, it'd try to look for the invalid regexp ^w^something$, which contains an out of place beginning of string, which would cause the mod to fail to install. Since one of those mods are mine, I get bug reports for your modding choices.

Um... NO.

^ at the beginning of a Regexp means "start of line" - not the "^" character. It is the opposite end delimiter to the $. So, really, you should have no problem at all, if that regexp parser is working properly. If it's not, we've got more issues than this.

If the data in %SOURCE_RES% is w^^^$$$w, it would still get parsed correctly. I cannot imagine you getting confused on data/code issues like that.
The Bigg
QUOTE(WizWom @ Aug 26 2006, 01:52 PM) *

Um... NO.

^ at the beginning of a Regexp means "start of line" - not the "^" character. It is the opposite end delimiter to the $. So, really, you should have no problem at all, if that regexp parser is working properly. If it's not, we've got more issues than this.

If the data in %SOURCE_RES% is w^^^$$$w, it would still get parsed correctly. I cannot imagine you getting confused on data/code issues like that.

I'm pretty sure the OCaml regexp parser hiccups on that. Just to test it,

CODE
Valerio@acer-01vdcn9bdz ~              
$ ocaml str.cma
        Objective Caml version 3.09.0

# let x = Str.regexp "w^^^$$$w";;
val x : Str.regexp = <abstr>
# let f = "w^^^$$$w";;
val f : string = "w^^^$$$w"
# Str.string_match x f 0;;
- : bool = false
# Str.search_forward x f 0;;
Exception: Not_found.
# exit 0;;

Valerio@acer-01vdcn9bdz ~
$


I.E. the regexp "w^^^$$$w" is not found inside the string "w^^^$$$w".
WizWom
QUOTE(The Bigg @ Aug 26 2006, 09:17 AM) *

QUOTE(WizWom @ Aug 26 2006, 01:52 PM) *

Um... NO.

^ at the beginning of a Regexp means "start of line" - not the "^" character. It is the opposite end delimiter to the $. So, really, you should have no problem at all, if that regexp parser is working properly. If it's not, we've got more issues than this.

If the data in %SOURCE_RES% is w^^^$$$w, it would still get parsed correctly. I cannot imagine you getting confused on data/code issues like that.

I'm pretty sure the OCaml regexp parser hiccups on that. Just to test it,

I.E. the regexp "w^^^$$$w" is not found inside the string "w^^^$$$w".

Silly... if you want to look for "^" or "$" in a regexp, you have to escape them. because they are specials. the Regexp "w^^^$$$w" would only find a 'w', followed by 5 blank lines, then another 'w'. To find "w^^^$$$w" would require the RegExp "w\^\^\^\$\$\$w" - or any of a number of other combinations - "w.*w" being the simplest.
The Bigg
Which would prove that W^something can cause problems in existing scenarios, since WeiDU would be looking for line breaks whereas it should be looking for a '^' character, thus causing the algorithm to fail.
WizWom
Ah, you still are getting confused with data/code issues.

Yes, if somehow I used the file or resource name as a regexp without excaping the '^', then I would have trouble. Which was why I made the initial post I did, about knowing how to. But whyever would I do that?

In the normal course of events, where your regexp you have written looks at the resouce names I have made, it will matter not a whit that the character it finds is a '^'; to the regular expression '^' is not matched by '^' but by '\^'. '^' indicates a beginning of line to a regexp, and no more gets matchd against the character '^' than '$' would.

Edit: I suppose, thinking of it, one might wish to make a check for references to a certain file, but then, that is what double quotes are for. i.e., ~"%FILE_NAME%"~ (or escaped however you would have to to get the regexp parser the double quote around the finle name you are checking for, I know not CAML)
The Bigg
QUOTE(WizWom @ Aug 26 2006, 06:45 PM) *

Edit: I suppose, thinking of it, one might wish to make a check for references to a certain file, but then, that is what double quotes are for. i.e., ~"%FILE_NAME%"~ (or escaped however you would have to to get the regexp parser the double quote around the finle name you are checking for, I know not CAML)

Yes, while ~^"%FILE_NAME%"$~ would work, I'd have to re-publish the mod to change this little tidbit, which isn't very funny wink.gif
WizWom
OK, let me get this straight, your mod is, for some crazy reason, searching for all references to arbitrary files? Whatever would you be doing that for?

Edit: the old wombat head off to find and explore your TP2 files...

Well, The Bigg Tweaks installed nicely, with no confusion with W^*.cre and W^*.SPL and W^*.BCS and W^*.ARE files in the override.

The bigg Quest (well, battle) installed without incident.

Refinements is confused, thinks I don't have ToB.
MTP and stivan install.

And they all, of course, uninstall. Because I didn't really want to play with any of them (no offense).

Baronius
If it is possible to solve the issue by replacing certain code in a few existing mods, I don't think it's justified to deny a whole range of prefix variations from all modders.
The Bigg
QUOTE(WizWom @ Aug 27 2006, 03:17 AM) *

OK, let me get this straight, your mod is, for some crazy reason, searching for all references to arbitrary files? Whatever would you be doing that for?

Edit: the old wombat head off to find and explore your TP2 files...

A modification to Virtue explores all CRE files, looks up their DV and figures out script and P dialogue, and tweaks the said files; afterwards, it appends to a list the names of the *P.d files and the BCS files it modified, to avoid patching them twice. The method to check if a given file is in said list involves regexps and would fail (as of now) with w^something in a DV. Similar problems should arise with Revised Armors from Refinements.

Bar: allowing W^ involves also further tweaks in WeiDU to the BAF parser (for when W^ decides to release a kit and wants to call it in BAF code), and said tweaks are usually cause of bugs which persist for two/three versions, it's not just a matter of 'add double quotes to a tp2'.
Also, there's the usual idea that "last mod produced must be careful for compatibility" and all of that.

EDIT: even better, there is no "exact^match" double quote construct in OCaml regexp.
Baronius
Well, my current viewpoint is the following:
Since there are plenty of "legal" prefix variations left, registering prefixes with ^ in this form won't be advised to users. Though if someone insists on it (like WizWom), the registration won't be denied.
WizWom
Well, I'm not insisting, I'm exploring. nothing i've seen so far makes my think that we've hit a snag; like I said, the refinements installed all components without a glitch. whether it made sense of the W^ stuff - that is, matched properly the ^ - is not my concern, I don't want other modders playing about with my stuff anyway, and putting in a hidden "no mathc here, move along" code is a good idea to me.

what I have established is that it doesn't BREAK any code; but does prevent that code from working on my stuff. I may need to explore the CAML regexp, but I seriously doubt they would change UNIX standard regexp systax that's been around for 37 years. KRA&W knew their stuff.

http://www.weidu.org/~thebigg/README-WeiDU.html#regexp
" interpret the characters inside "" literally
Seems I was correct, WeiDU does have all the typical RegExp sysntax.
WizWom
What the Bigg MEANT TO SAY, or rather, SHOULD HAVE said was:

Windows XP has decided that '^' is a special character, and so it must be escaped in file names. Getting that in to the right place means a lot of quoting. you may not really be up for it.
NiGHTMARE
If you don't care about compatability, isn't using a prefix pretty pointless?
WizWom
QUOTE(NiGHTMARE @ Aug 28 2006, 08:42 AM) *

If you don't care about compatability, isn't using a prefix pretty pointless?

I do care about compatability.

But '^' introduces no compatability problems. It introduces some complexity on the part of the coder, perhaps more complexity that I wish, since it seems to be a special character in regexp, and the Windows XP file parser. Both can be escaped, but WeiDU has a special wrapper over the glob function, which seems to prevent quoting properly for COPY GLOB commands.

But since that is at install, a '?' in the position of the '^' works just fine - since I won't be flailing about with other possible W? combinations in my own install, will I? And if I do, those can go in another directory.

The Bigg
QUOTE(WizWom @ Aug 28 2006, 03:01 PM) *

But '^' introduces no compatability problems.

Well, sorry to break out the bad news, but... it does.

If you use the tp2 modification to virtue I linked above and do the obvious hack to allow installation on Tutu (which is currently prevented), the new C_E_R section of the script patches correctly BG1 NPCs, as well as Indira, Finch, or Mur'Neth. Jon, however, fails to be patched due to the '^' problem, and will talk as if he was kicked off the team.

EDIT: doh, the thread wasn't linked above. http://forums.pocketplane.net/index.php/topic,21867.0.html; the tp2 modification is the attachment to the third post.
WizWom
So I see. The problem is not what you think it is, but it is still a problem.
Easy enough use another prefix.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2024 Invision Power Services, Inc.