The sole purpose of the typedefs RKCompileErrorCode, RKCompileOption, RKMatchErrorCode, and RKMatchOption is to provide Foundation naming convention equivalents on top of the corresponding PCRE library values. None of the typedefs or RegexKit methods modify the original PCRE values, in fact the values as defined in the PCRE library pcre.h header file can still be used in the RegexKits equivalent methods.
The reasoning behind this is to facilitate later versions of the PCRE library, which may define additional options or error codes. Since the RegexKit includes the pcre.h from the linked against PCRE library, the pcre.h values may be used until their equivalents can be updated in RegexKitTypes.h.
By default, PCRE treats the subject string as consisting of a single line of characters (even if it actually contains newlines). The start of line metacharacter ^ matches only at the start of the string, while the end of line metacharacter $ matches only at the end of the string, or before a terminating newline (unless RKCompileDollarEndOnly is set). This is the same as Perl.
When RKCompileMultiline is set, the start of line and end of line constructs match immediately following or immediately before internal newlines in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m option, and it can be changed within a pattern by a ?m option setting. If there are no newlines in a subject string, or no occurrences of ^ or $ in a pattern, setting RKCompileMultiline has no effect.
If this bit is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class. Whitespace does not include the VT character (code 11). In addition, characters between an unescaped # outside a character class and the next newline, inclusive, are also ignored. This is equivalent to Perl's /x option, and it can be changed within a pattern by a ?x option setting.
This option makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional subpattern.
Error codes that are returned by getRanges:withCharacters:length:inRange:options:.
An empty string is not considered to be a valid match if this option is set. If there are alternatives in the regular expression, they are tried. If all the alternatives match the empty string, the entire match fails. For example, if the regular expression
is applied to a string not beginning with "a" or "b", it matches the empty string at the start of the subject. With RKMatchNotEmpty set, this match is not valid, so PCRE searches further into the string for occurrences of "a" or "b".
Perl has no direct equivalent of RKMatchNotEmpty, but it does make a special case of a pattern match of the empty string within its split() function, and when using the /g modifier. It is possible to emulate Perl's behavior after matching a null string by first trying the match again at the same offset with RKMatchNotEmpty and RKMatchAnchored, and then if that fails by advancing the starting offset (see below) and trying an ordinary match again. There is some code that demonstrates how to do this in the pcredemo.c sample program.
When RKCompileUTF8 is set at compile time, the validity of the subject as a UTF-8 string is automatically checked when getRanges:withCharacters:length:inRange:options: is subsequently called. The value of searchRange location is also checked to ensure that it points to the start of a UTF-8 character. If an invalid UTF-8 sequence of bytes is found, getRanges:withCharacters:length:inRange:options: returns the error RKMatchErrorBadUTF8Offset. If searchRange location contains an invalid value, RKMatchErrorBadUTF8Offset is returned.
If you already know that your subject is valid, and you want to skip these checks for performance reasons, you can set the RKMatchNoUTF8Check option when calling getRanges:withCharacters:length:inRange:options:. You might want to do this for the second and subsequent calls to getRanges:withCharacters:length:inRange:options: if you are making repeated calls to find all the matches in a single subject string. However, you should be sure that the value of searchRange location points to the start of a UTF-8 character. When RKMatchNoUTF8Check is set, the effect of passing an invalid UTF-8 string as a charactersBuffer, or a value of searchRange location that does not point to the start of a UTF-8 character, is undefined. Your program may crash.