The RKRegex class declares the programmatic interface for the RKRegex framework to the PCRE regular expression pattern matching library.
Some of the noteworthy features provided by the RKRegex class are:
The RKRegex class provides the low level primitives necessary to perform regular expression matching. The matching functions perform their work on raw byte buffers and provide match results in the form of NSRange structures containing the range of a match and the range of any matching subpatterns of a regular expression.
In addition to the low level matching primitives, the RKRegex class provides information about the underlying PCRE library, such as the version with the method PCREVersionString and PCRE compile time options via PCREBuildConfig. Various methods for obtaining information about the instantiated RKRegex compiled regular expression named capture subpatterns, if any, are provided.
In general, the RKRegex class is not used by end-user applications directly. Since the RKRegex class only provides low level primitives, end-user functionality is provided via various category extensions to common Foundation objects, such as the RegexKit framework additions to NSArray, NSDictionary, NSSet, NSString, and their mutable variants. Match enumeration is provided by the RKEnumerator class.
Unicode strings are fully supported by RegexKit, both in the regular expression pattern and the string to search.
The Foundation additions use Unicode strings exclusively for the buffer that RKRegex performs matches against. If a NSString has an encoding format other than ASCII, it is first converted to UTF8 before any matching can occur. Because of this, RKRegex objects must have the RKCompileOption flags RKCompileUTF8 and RKCompileNoUTF8Check set. Bytes with the most significant bit set in UTF8 encoded strings have special meaning that must be interpreted. Without these options set, PCRE treats the buffer as a collection of 8 bit bytes without the required UTF8 decoding.
The various Foundation additions will accept either a RKRegex or a NSString for the regex argument. If the supplied object is a NSString, it is automatically converted to a RKRegex object via the regexWithRegexString:options: method with an option argument of (RKCompileUTF8 | RKCompileNoUTF8Check).
For the purposes of calculating character indexes, Foundation treats all strings as if they were UTF16 encoded. PCRE, on the other hand, uses UTF8 exclusively. This has important consequences when using strings that are encoded in anything but ASCII. It is important to understand that all of the Foundation additions, and the RKEnumerator class, calculate all character index values as UTF16 character indexes. Since PCRE can only operate on UTF8 encoded strings, this requires any NSRange values to be converted between the two character index spaces. This provides transparent interoperability with the rest of Foundation at the expense of having to perform the character index conversation.
However, the RKRegex methods use UTF8 character indexes for all NSRange values. This is an important distinction as NSRange values returned by RKRegex objects will result in undefined behavior if passed to NSString objects without converting to the equivalent UTF16 character indexes. The functions RKConvertUTF8ToUTF16RangeForString and RKConvertUTF16ToUTF8RangeForString can be used to perform the necessary conversions, if required.
The reasoning behind this is that the RKRegex class provides low level access to the PCRE engine. The Foundation additions provide abstracted access to the underlying pattern matching engine. There are still many useful tasks that can be performed with low-level access, such as not enabling RKCompileUTF8 and matching raw binary byte buffers. Therefore, the RKRegex class tries to provide unadulterated access to the PCRE matching engine or those users who have special requirements.
The RKRegex class fully supports the NSCoding protocol. When a RKRegex is archived, the regular expression string used to create the receiver is coded, along with any RKCompileOption options. In addition to these two main items, the version and RKBuildConfig flags are also encoded to aid in debugging any unarchiving issues.
If problems are encountered when attempting to initialize a coded RKRegex regular expression, a NSInvalidUnarchiveOperationException is raised. The userInfo portion of the exception contains additional information regarding the failed attempt. Some of the additional information includes any difference in the archiving RKRegex PCRE version, any unknown or unsupported RKCompileOption flags for the current RegexKit, and any differences in RKBuildConfig flags.
Used primarily when a regular expression is compiled with RKCompileDupNames or when the (?J) option has been set to determine the capture index for the first successful match in the matchedRanges result from getRanges:withCharacters:length:inRange:options:. If none of the multiple captureNameString successfully matched then NSNotFound will be returned.
May be used when a regular expression is not compiled with RKCompileDupNames or there is only a single instance of captureNameString, in which case the result will be the capture index of captureNameString only if captureNameString successfully matched, otherwise NSNotFound is returned.
This method is similar to captureIndexForCaptureName:inMatchedRanges: except that it optionally returns a NSError object for error conditions instead of throwing an exception. The error parameter may be set to nil if information about the error is not required.
If the regular expression of the receiver uses named subcaptures (ie, (?<year>(\d\d)?\d\d) ), then for each capture name there exists a corresponding capture index. A NSArray is created with captureCount elements and for every capture name the corresponding array index is set to a NSString of the capture name. If there is no capture name for an index, a NSNull is used instead.
This method returns nil if the receivers regular expression does not contain any named subcaptures.
This method is the low level matching primitive to the PCRE library.
getRanges:withCharacters:length:inRange:options: allocates all of the memory needed to perform the regular expression matching and store any temporary results on the stack. The match results, if any, are translated from the PCRE library format to the equivalent NSRange format and stored in the caller supplied ranges NSRange array. For nearly all cases this means that there is no associated malloc() overhead involved. See rangesForCharacters:length:inRange:options:, which creates an autorelease buffer to store the results, if the caller is unable to provide a suitable buffer.
It is important to note that setting the searchRange.location and adding the equivalent offset to charactersBuffer are not the same thing. The value of charactersBuffer marks the hard start of the buffer, whereas a positive searchRange.location makes the characters from charactersBuffer up to searchRange.location available to the matching engine. This is an important distinction for some types of regular expressions, such as those that use lookbehind (ie, (?<=)), which may require examining characters that are strictly not within searchRange.
Unlike initWithRegexString:options:, this method does not throw an exception on errors. Instead, a NSError object is created and returned via the optional error parameter.
Raises RKRegexSyntaxErrorException if regexString in combination with options is not a valid regular expression. The exception provides a userInfo dictionary containing the following keys and information:
|regexString||NSString||The regexString regular expression that caused the exception.|
|regexStringErrorLocation||NSNumber||The location of the character that caused the syntax error.|
|regexAttributedString||NSAttributedString||The regexString regular expression with a NSBackgroundColorAttributeName set to [NSColor redColor] for the character that caused the error along with the NSToolTipAttributeName attribute (if supported) set to errorString.|
|errorString||NSString||The error string that the PCRE library returned.|
|RKCompileOption||NSNumber||The RKCompileOption that was passed with regexString.|
|RKCompileOptionString||NSString||A human readable C bitwise OR equivalent string of RKCompileOption options.|
|RKCompileOptionArray||NSArray||The human readable equivalent of the individual C bitwise RKCompileOption options flags in a NSArray.|
|RKCompileErrorCode||NSNumber||The RKCompileErrorCode that the PCRE library returned.|
|RKCompileErrorCodeString||NSString||A human readable equivalent of the RKCompileErrorCode name that the PCRE library returned.|
Currently creates a regular expression using the RKRegexPCRELibrary PCRE library.
The returned pointer of an array of captureCount NSRange structures is automatically freed just as a autoreleased object would be released; you should copy any values that are required past the autorelease context in which they were created.
There is no need to free() the returned result as it will automatically be deallocated at the end of the current NSAutoreleasePool context.
A pointer to an autoreleased allocation of memory that is sizeof(NSRange) * [self captureCount] bytes long and contains captureCount NSRange structures with the location and length for the capture indexes of the first match in matchCharacters of length length within the range searchRange using options.
Returns NULL if the receiver does not match matchCharacters using the supplied arguments.