Defined Types

Overview

The sole purpose of the typedefs RKCompileErrorCode, RKCompileOption, RKMatchErrorCode, and RKMatchOption is to provide Foundation naming convention equivalents on top of the corresponding PCRE library values. None of the typedefs or RegexKit methods modify the original PCRE values, in fact the values as defined in the PCRE library pcre.h header file can still be used in the RegexKits equivalent methods.

The reasoning behind this is to facilitate later versions of the PCRE library, which may define additional options or error codes. Since the RegexKit includes the pcre.h from the linked against PCRE library, the pcre.h values may be used until their equivalents can be updated in RegexKitTypes.h.

A bitmap of flags representing the configuration options of the PCRE library when it was initially built. Some options may represent default values, others may represent features that can not be altered or added at run time. If a required feature is missing then the underlying PCRE library that RKRegex is linked against will have to be changed. This will most likely require rebuilding the PCRE library and RegexKit framework from the source with the desired configuration options.
Constants
No build config options specified.
Set if the PCRE library was compiled with UTF-8 support. This feature is normally enabled for the RegexKit framework by default. See UTF-8 Support and UTF-8 and Unicode Property Support for more information.
Set if the PCRE library was compiled with Unicode Properties support, enabling the regular expression pattern escapes \P, \p, and \X. This feature is normally enabled for the RegexKit framework by default. See Unicode Character Property Support and UTF-8 and Unicode Property Support for more information.
The default character sequence. See Code Value of Newline for more information.
The character 13 (carriage return, CR) is the default end of line character. See Code Value of Newline for more information.
The character 10 (linefeed, LF) is the default end of line character. See Code Value of Newline for more information.
The character sequence 13 (carriage return, CR), 10 (linefeed, LF) is the default end of line character sequence. See Code Value of Newline for more information.
Any valid Unicode newline sequence is the default end of line. See Code Value of Newline for more information.
The default end of line character sequence is a combination of RKBuildConfigNewlineCR, RKBuildConfigNewlineLF, and RKBuildConfigNewlineCRLF. See Code Value of Newline for more information.
A bitmask to extract only the newline setting. See Code Value of Newline and Newlines for more information.
The regular expression escape sequence \R matches only CR, LF, or CRLF.
The regular expression escape sequence \R matches any Unicode line ending sequence.
Declared In
RegexKitTypes.h
The error reported by the PCRE library when attempting to compile a regular expression.
typedef enum {
} RKCompileErrorCode;
Constants
\ at end of pattern.
Unrecognized character follows \.
Numbers out of order in {} quantifier.
Number too big in {} quantifier.
Missing terminating ] for character class.
Invalid escape sequence in character class.
Range out of order in character class.
Nothing to repeat.
Internal error, unexpected repeat.
Unrecognized character after (?.
POSIX named classes are supported only within a class.
Reference to non-existent subpattern.
Internal error, erroffset passed as NULL.
Regular expression too large.
Memory allocation failure.
Unmatched parentheses.
Internal error, code overflow.
Lookbehind assertion is not fixed length.
Malformed number or name after (?(.
Conditional group contains more than two branches.
Assertion expected after (?(.
(?R or (?digits must be followed by ).
Unknown POSIX class name.
POSIX collating elements are not supported.
The PCRE library was not built with UTF-8 support. See RKBuildConfigUTF8.
Character value in \x{...} sequence is too large.
Invalid condition (?(0).
\C not allowed in lookbehind assertion.
PCRE does not support \L, \l, \N, \U, or \u.
Number after (?C is > 255.
closing ) for (?C expected.
Recursive call could loop indefinitely.
Unrecognized character after (?P.
Syntax error in subpattern name (missing terminator).
Two named subpatterns have the same name. See RKCompileDupNames.
Invalid UTF-8 string.
The PCRE library was not built with Unicode support. \P, \p, and \X are invalid. See RKBuildConfigUnicodeProperties.
Malformed \P or \p sequence.
Unknown property name after \P or \p.
Subpattern name is too long (maximum 32 characters).
Too many named subpatterns (maximum 10,000).
Repeated subpattern is too long.
Octal value is greater than \377 (not in UTF-8 mode).
Internal error, overran compiling workspace.
Internal error, previously-checked referenced subpattern not found.
DEFINE group contains more than one branch.
Repeating a DEFINE group is not allowed.
\g must be followed by a non-zero number or a braced name or number (ie, {name} or {0123}).
The relative subpattern reference parameter to (?+ , (?- , (?(+ , or (?(- must be followed by a non-zero number.
Declared In
RegexKitTypes.h
A collection of bitmask options that can be combined together and passed via the options argument of regexWithRegexString:options: or initWithRegexString:options:.
typedef enum {
=
1 << 0,
=
1 << 1,
=
1 << 2,
=
1 << 3,
=
1 << 4,
=
1 << 6,
=
1 << 9,
=
1 << 11,
=
1 << 18,
=
1 << 19,
=
(RKCompileCaseless | RKCompileMultiline | RKCompileDotAll | RKCompileExtended | RKCompileAnchored | RKCompileDollarEndOnly | RKCompileExtra | RKCompileUngreedy | RKCompileUTF8 | RKCompileNoAutoCapture | RKCompileNoUTF8Check | RKCompileAutoCallout | RKCompileFirstLine | RKCompileDupNames | RKCompileBackslashRAnyCRLR | RKCompileBackslashRUnicode),
=
(RKCompileAutoCallout),
=
0x00000000,
=
0x00100000,
=
0x00200000,
=
0x00300000,
=
0x00400000,
=
0x00500000,
=
0x00700000,
} RKCompileOption;
Constants
No specific options.
If this bit is set, letters in the pattern match both upper and lower case letters. It is equivalent to Perl's /i option, and it can be changed within a pattern by a ?i option setting. In UTF-8 mode, PCRE always understands the concept of case for characters whose values are less than 128, so caseless matching is always possible. For characters with higher values, the concept of case is supported if the PCRE library is built with Unicode property support, but not otherwise. If you want to use caseless matching for characters 128 and above, you must ensure that the PCRE library is built with Unicode property support as well as with UTF-8 support. See RKBuildConfig.

By default, PCRE treats the subject string as consisting of a single line of characters (even if it actually contains newlines). The start of line metacharacter ^ matches only at the start of the string, while the end of line metacharacter $ matches only at the end of the string, or before a terminating newline (unless RKCompileDollarEndOnly is set). This is the same as Perl.

When RKCompileMultiline is set, the start of line and end of line constructs match immediately following or immediately before internal newlines in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m option, and it can be changed within a pattern by a ?m option setting. If there are no newlines in a subject string, or no occurrences of ^ or $ in a pattern, setting RKCompileMultiline has no effect.

If this bit is set, a dot metacharacter in the pattern matches all characters, including those that indicate newline. Without it, a dot does not match when the current position is at a newline. This option is equivalent to Perl's /s option, and it can be changed within a pattern by a ?s option setting. A negative class such as [^a] always matches newline characters, independent of the setting of this option.

If this bit is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class. Whitespace does not include the VT character (code 11). In addition, characters between an unescaped # outside a character class and the next newline, inclusive, are also ignored. This is equivalent to Perl's /x option, and it can be changed within a pattern by a ?x option setting.

This option makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional subpattern.

If this bit is set, the pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in the string that is being searched (the "subject string"). This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl.
If this bit is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this option, a dollar also matches immediately before a newline at the end of the string (but not before any other newlines). The RKCompileDollarEndOnly option is ignored if RKCompileMultiline is set. There is no equivalent to this option in Perl, and no way to set it within a pattern.
This option was invented in order to turn on additional functionality of PCRE that is incompatible with Perl, but it is currently of very little use. When set, any backslash in a pattern that is followed by a letter that has no special meaning causes an error, thus reserving these combinations for future expansion. By default, as in Perl, a backslash followed by a letter with no special meaning is treated as a literal. (Perl can, however, be persuaded to give a warning for this.) There are at present no other features controlled by this option. It can also be set by a ?X option setting within a pattern.
This option inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by ?. It is not compatible with Perl. It can also be set by a ?U option setting within the pattern.
This option causes PCRE to regard both the pattern and the subject as strings of UTF-8 characters instead of single-byte character strings. However, it is available only when the PCRE library is built to include UTF-8 support. If not, the use of this option returns an error. See UTF-8 and Unicode Property Support for more information.
If this option is set, it disables the use of numbered capturing parentheses in the pattern. Any opening parenthesis that is not followed by ? behaves as if it were followed by ?: but named parentheses can still be used for capturing (and they acquire numbers in the usual way). There is no equivalent of this option in Perl.
When RKCompileUTF8 is set, the validity of the pattern as a UTF-8 string is automatically checked. If an invalid UTF-8 sequence of bytes is found, initWithRegexString:options: returns an error. If you already know that your pattern is valid, and you want to skip this check for performance reasons, you can set the RKCompileNoUTF8Check option. When it is set, the effect of passing an invalid UTF-8 string as a pattern is undefined. It may cause your program to crash. Note that RKMatchNoUTF8Check can also be passed to getRanges:withCharacters:length:inRange:options: to suppress the UTF-8 validity checking of subject strings.

If this bit is set, initWithRegexString:options: automatically inserts callout items, all with number 255, before each pattern item. For discussion of the callout facility, see the PCRE Callouts documentation.

Important:
Use of callouts are unsupported and will raise a RKRegexUnsupportedException if used.
If this option is set, an unanchored pattern is required to match before or at the first newline in the subject string, though the matched text may continue over the newline.
If this bit is set, names used to identify capturing subpatterns need not be unique. This can be helpful for certain types of regular expressions when it is known that only one instance of the named subpattern can ever be matched. See Named Subpatterns for more information. The option may also be set be specifying the (?J) option in the regular expression.
The escape sequence \R for the compiled regular expression will match only CR, LF, or CRLF. This option is mutually exclusive of RKCompileBackslashRUnicode.
The escape sequence \R for the compiled regular expression will match any Unicode line ending sequence. This option is mutually exclusive of RKCompileBackslashRAnyCRLR.
Contains a bitmask of all the defined options.
Contains a bitmask of invalid options.
The default newline sequence defined when the PCRE library was built.
The character 13 (carriage return, CR) is the default end of line character.
The character 10 (linefeed, LF) is the default end of line character.
The character sequence 13 (carriage return, CR), 10 (linefeed, LF) is the default end of line character sequence.
Any valid Unicode newline sequence is the default end of line.
Any of the newline character sequences from RKCompileNewlineCR, RKCompileNewlineLF, or RKCompileNewlineCRLF will be used as a match for the end of line character sequence.
A bitmask to extract only the newline setting.
The number of bits that the newline type is shifted to the left.
Declared In
RegexKitTypes.h

Error codes that are returned by getRanges:withCharacters:length:inRange:options:.

Note:
All RKMatchErrorCode error codes are < 0.
Constants
The subject string did not match the regular expression.
An unrecognized bit was set in the RKMatchOption options argument.
PCRE stores a 4-byte "magic number" at the start of the compiled code, to catch the case when it is passed an invalid pointer and to detect when a pattern that was compiled in an environment of one endianness is run in an environment with the other endianness. This is the error that PCRE gives when the magic number is not present.
While running the pattern match, an unknown item was encountered in the compiled pattern. This error could be caused by a bug in PCRE or by overwriting of the compiled pattern.
If a pattern contains back references and the internal matching buffers used by getRanges:withCharacters:length:inRange:options: are not big enough to hold the referenced substrings, then the PCRE library will allocate a block of memory at the start of matching to use for this purpose. If the PCRE library is unable to allocate the additional memory, this error is returned.
The internal backtracking limit was reached.

This error is never generated by getRanges:withCharacters:length:inRange:options: itself. It is provided for use by callout functions that want to yield a distinctive error code. See the PCRE Callouts documentation for details.

Important:
Use of callouts are unsupported and will raise a RKRegexUnsupportedException if used.
A string that contains an invalid UTF-8 byte sequence was passed as a subject.
The UTF-8 byte sequence that was passed as a subject was valid, but the value of searchRange.location did not point to the beginning of a UTF-8 character.
The subject string did not match, but it did match partially. See the Partial Matching in PCRE documentation for details.
The RKMatchPartial option was used with a compiled pattern containing items that are not supported for partial matching. See the Partial Matching in PCRE documentation for details.
An unexpected internal error has occurred. This error could be caused by a bug in PCRE or by overwriting of the compiled pattern.
The internal recursion limit was reached.
When a group that can match an empty substring is repeated with an unbounded upper limit, the subject position at the start of the group must be remembered, so that a test for an empty string can be made when the end of the group is reached. Some workspace is required for this; if it runs out, this error is given.
An invalid combination of RKMatchNewlineMask options was given.
Declared In
RegexKitTypes.h
A collection of bitmask options that can be combined together and passed via the options argument of getRanges:withCharacters:length:inRange:options: or one of the other RKRegex matching methods.
typedef enum {
=
1 << 4,
=
1 << 10,
=
1 << 13,
=
1 << 15,
=
0x00000000,
=
0x00100000,
=
0x00200000,
=
0x00300000,
=
0x00400000,
=
0x00500000,
=
0x00700000,
} RKMatchOption;
Constants
No specific options
The RKMatchAnchored option limits getRanges:withCharacters:length:inRange:options: to matching at the first matching position. If the regular expression was compiled with RKCompileAnchored, or turned out to be anchored by virtue of its contents, it cannot be made unanchored at matching time.
This option specifies that first character of the subject string is not the beginning of a line, so the circumflex metacharacter should not match before it. Setting this without RKCompileMultiline (at compile time) causes circumflex never to match. This option affects only the behavior of the circumflex metacharacter. It does not affect \A.
This option specifies that the end of the subject string is not the end of a line, so the dollar metacharacter should not match it nor (except in RKCompileMultiline mode) a newline immediately before it. Setting this without RKCompileMultiline (at compile time) causes dollar never to match. This option affects only the behavior of the dollar metacharacter. It does not affect \Z or \z.

An empty string is not considered to be a valid match if this option is set. If there are alternatives in the regular expression, they are tried. If all the alternatives match the empty string, the entire match fails. For example, if the regular expression

a?b?

is applied to a string not beginning with "a" or "b", it matches the empty string at the start of the subject. With RKMatchNotEmpty set, this match is not valid, so PCRE searches further into the string for occurrences of "a" or "b".

Perl has no direct equivalent of RKMatchNotEmpty, but it does make a special case of a pattern match of the empty string within its split() function, and when using the /g modifier. It is possible to emulate Perl's behavior after matching a null string by first trying the match again at the same offset with RKMatchNotEmpty and RKMatchAnchored, and then if that fails by advancing the starting offset (see below) and trying an ordinary match again. There is some code that demonstrates how to do this in the pcredemo.c sample program.

When RKCompileUTF8 is set at compile time, the validity of the subject as a UTF-8 string is automatically checked when getRanges:withCharacters:length:inRange:options: is subsequently called. The value of searchRange location is also checked to ensure that it points to the start of a UTF-8 character. If an invalid UTF-8 sequence of bytes is found, getRanges:withCharacters:length:inRange:options: returns the error RKMatchErrorBadUTF8Offset. If searchRange location contains an invalid value, RKMatchErrorBadUTF8Offset is returned.

If you already know that your subject is valid, and you want to skip these checks for performance reasons, you can set the RKMatchNoUTF8Check option when calling getRanges:withCharacters:length:inRange:options:. You might want to do this for the second and subsequent calls to getRanges:withCharacters:length:inRange:options: if you are making repeated calls to find all the matches in a single subject string. However, you should be sure that the value of searchRange location points to the start of a UTF-8 character. When RKMatchNoUTF8Check is set, the effect of passing an invalid UTF-8 string as a charactersBuffer, or a value of searchRange location that does not point to the start of a UTF-8 character, is undefined. Your program may crash.

This option turns on the partial matching feature. If the subject string fails to match the regular expression, but at some point during the matching process the end of the subject was reached (that is, the subject partially matches the pattern and the failure to match occurred only because there were not enough subject characters), getRanges:withCharacters:length:inRange:options: returns RKMatchErrorPartial instead of RKMatchErrorNoMatch. When RKMatchPartial is used, there are RK_C99(restrict)ions on what may appear in the pattern. These are discussed in Partial Matching in PCRE.
The default newline sequence defined when the PCRE library was built.
The character 13 (carriage return, CR) is used as the end of line character during the match.
The character 10 (linefeed, LF) is used as the end of line character during the match.
The character sequence 13 (carriage return, CR), 10 (linefeed, LF) is used as the end of line character sequence during the match.
Any valid Unicode newline sequence is used as the end of line during the match.
RKMatchNewlineCR, RKMatchNewlineLF, and RKMatchNewlineCRLF will be used as the end of line character sequence during the match.
A bitmask to extract only the newline setting.
The escape sequence \R in the compiled regular expression will match only CR, LF, or CRLF, temporarily over-riding the setting used when the regular expression was compiled. This option is mutually exclusive of RKMatchBackslashRUnicode.
The escape sequence \R in the compiled regular expression will match any Unicode line ending sequence, temporarily over-riding the setting used when the regular expression was compiled. This option is mutually exclusive of RKMatchBackslashRAnyCRLR.
Declared In
RegexKitTypes.h
 
RegexKit project hosted by: