RKRegex Class Reference

Inherits fromNSObject
RegexKit0.6.0 Release Notes
PCRE7.6
AvailabilityAvailable in Mac OS X v10.4 or later.
Declared in
  • RegexKit/RKRegex.h
Overview

The RKRegex class declares the programmatic interface for the RKRegex framework to the PCRE regular expression pattern matching library.

Some of the noteworthy features provided by the RKRegex class are:

Note:
Since the regular expression is cached and reused again and again, the regular expression is always studied. See Studying a Pattern and pcre_study for more information.

The RKRegex class provides the low level primitives necessary to perform regular expression matching. The matching functions perform their work on raw byte buffers and provide match results in the form of NSRange structures containing the range of a match and the range of any matching subpatterns of a regular expression.

In addition to the low level matching primitives, the RKRegex class provides information about the underlying PCRE library, such as the version with the method PCREVersionString and PCRE compile time options via PCREBuildConfig. Various methods for obtaining information about the instantiated RKRegex compiled regular expression named capture subpatterns, if any, are provided.

In general, the RKRegex class is not used by end-user applications directly. Since the RKRegex class only provides low level primitives, end-user functionality is provided via various category extensions to common Foundation objects, such as the RegexKit framework additions to NSArray, NSDictionary, NSSet, NSString, and their mutable variants. Match enumeration is provided by the RKEnumerator class.

Unicode Support

Unicode strings are fully supported by RegexKit, both in the regular expression pattern and the string to search.

Foundation Additions RKCompileOption Requirements

The Foundation additions use Unicode strings exclusively for the buffer that RKRegex performs matches against. If a NSString has an encoding format other than ASCII, it is first converted to UTF8 before any matching can occur. Because of this, RKRegex objects must have the RKCompileOption flags RKCompileUTF8 and RKCompileNoUTF8Check set. Bytes with the most significant bit set in UTF8 encoded strings have special meaning that must be interpreted. Without these options set, PCRE treats the buffer as a collection of 8 bit bytes without the required UTF8 decoding.

The various Foundation additions will accept either a RKRegex or a NSString for the regex argument. If the supplied object is a NSString, it is automatically converted to a RKRegex object via the regexWithRegexString:options: method with an option argument of (RKCompileUTF8 | RKCompileNoUTF8Check).

If you are supplying an instantiated RKRegex object instead of using the NSString auto-compile functionality, the RKCompileOption options RKCompileUTF8 and RKCompileNoUTF8Check must be set.

Note:
Unlike the other Foundation additions, the NSData additions do not require and do not set the RKCompileUTF8 and RKCompileNoUTF8Check compile option flags.
Important:
Failure to set the required RKCompileOption options will cause the supplied RKRegex to be discarded and a new RKRegex object created from the discarded regex with the required UTF8 flags logically ORd to any existing options.

Important NSRange Differences

For the purposes of calculating character indexes, Foundation treats all strings as if they were UTF16 encoded. PCRE, on the other hand, uses UTF8 exclusively. This has important consequences when using strings that are encoded in anything but ASCII. It is important to understand that all of the Foundation additions, and the RKEnumerator class, calculate all character index values as UTF16 character indexes. Since PCRE can only operate on UTF8 encoded strings, this requires any NSRange values to be converted between the two character index spaces. This provides transparent interoperability with the rest of Foundation at the expense of having to perform the character index conversation.

However, the RKRegex methods use UTF8 character indexes for all NSRange values. This is an important distinction as NSRange values returned by RKRegex objects will result in undefined behavior if passed to NSString objects without converting to the equivalent UTF16 character indexes. The functions RKConvertUTF8ToUTF16RangeForString and RKConvertUTF16ToUTF8RangeForString can be used to perform the necessary conversions, if required.

The reasoning behind this is that the RKRegex class provides low level access to the PCRE engine. The Foundation additions provide abstracted access to the underlying pattern matching engine. There are still many useful tasks that can be performed with low-level access, such as not enabling RKCompileUTF8 and matching raw binary byte buffers. Therefore, the RKRegex class tries to provide unadulterated access to the PCRE matching engine or those users who have special requirements.

NSCoding Support

The RKRegex class fully supports the NSCoding protocol. When a RKRegex is archived, the regular expression string used to create the receiver is coded, along with any RKCompileOption options. In addition to these two main items, the version and RKBuildConfig flags are also encoded to aid in debugging any unarchiving issues.

If problems are encountered when attempting to initialize a coded RKRegex regular expression, a NSInvalidUnarchiveOperationException is raised. The userInfo portion of the exception contains additional information regarding the failed attempt. Some of the additional information includes any difference in the archiving RKRegex PCRE version, any unknown or unsupported RKCompileOption flags for the current RegexKit, and any differences in RKBuildConfig flags.

Adopted Protocols

NSCoding
NSCopying

Tasks

PCRE Library Information
Regular Expression Cache
Creating Regular Expressions
Instantiated Regular Expression Information
Named Capture Information
Matching Regular Expressions

Class Methods

Returns a RKBuildConfig mask representing features and configuration settings of the PCRE library when it was initially built.
+ (RKBuildConfig)PCREBuildConfig;
Return Value
A mask of RKBuildConfig flags combined with the C bitwise OR operator representing features or defaults of the PCRE library that were set when the library was built.
Returns the PCRE library major version.
+ (int32_t)PCREMajorVersion;
Return Value
Returns an RKUInteger of the major version in PCREVersionString.
Returns the PCRE library minor version.
+ (int32_t)PCREMinorVersion;
Return Value
Returns an RKUInteger of the minor version in PCREVersionString.
Returns a NSString of the PCRE library version.
+ (NSString *)PCREVersionString;
Discussion
The underlying PCRE library will typically return a version string similar to "7.0 18-Dec-2006".
Return Value
Returns a NSString encapsulated copy of the characters returned by pcre_version() library function.
Returns a Boolean value that indicates whether regexString and options are valid.
+ (BOOL)isValidRegexString:(NSString * const)regexString options:(const RKCompileOption)options;
Parameters
  • regexString
    The regular expression to check.
  • options
    A mask of options specified by combining RKCompileOption flags with the C bitwise OR operator.
Discussion
Invokes regexWithRegexString:options: with regexString and options within a @try / @catch block. If the result is non-nil, then the regexString is considered valid and YES is returned, otherwise NO is returned. Any exceptions thrown during validation will be caught by isValidRegexString:options: and NO will be returned.
Return Value
Returns YES if valid, NO otherwise.
Returns the current regular expression cache.
+ (RKCache *)regexCache;
See Also
Convenience method for an autoreleased RKRegex object.
+ (id)regexWithRegexString:(NSString * const restrict)regexString library:(NSString * const restrict)libraryString options:(const RKCompileOption)libraryOptions error:(NSError **)error;
Discussion
Currently the only supported regular expression matching library is RKRegexPCRELibrary.
Return Value
Returns an autoreleased RKRegex object if successful, nil otherwise.
Convenience method for an autoreleased RKRegex object.
+ (id)regexWithRegexString:(NSString * const)regexString options:(const RKCompileOption)options;
Discussion
Currently creates a regular expression using the RKRegexPCRELibrary PCRE library.
Return Value
Returns an autoreleased RKRegex object if successful, nil otherwise.

Instance Methods

Returns the number of captures that the receivers regular expression contains.
- (RKUInteger)captureCount;
Discussion
Every regular expression has at least one capture representing the entire range that the regular expression matched. Additional subcaptures are created with () pairs.
Returns the capture index for captureNameString, or the first capture index of captureNameString if compiled with RKCompileDupNames.
- (RKUInteger)captureIndexForCaptureName:(NSString * const)captureNameString;
Parameters
  • captureNameString
    The name of the desired capture index.
    Important:
    Raises a NSInvalidArgumentException if captureNameString is nil or is not a valid capture name for the receivers regular expression.
Returns the capture index for captureNameString from a match operation, or the capture index of the first successful match for captureNameString if RKCompileDupNames is used and there are multiple instances of captureNameString in the receivers regular expression.
- (RKUInteger)captureIndexForCaptureName:(NSString * const restrict)captureNameString inMatchedRanges:(const NSRange * const restrict)matchedRanges;
Parameters
Discussion

Used primarily when a regular expression is compiled with RKCompileDupNames or when the (?J) option has been set to determine the capture index for the first successful match in the matchedRanges result from getRanges:withCharacters:length:inRange:options:. If none of the multiple captureNameString successfully matched then NSNotFound will be returned.

May be used when a regular expression is not compiled with RKCompileDupNames or there is only a single instance of captureNameString, in which case the result will be the capture index of captureNameString only if captureNameString successfully matched, otherwise NSNotFound is returned.

Return Value
The first capture index that matched in matchedRanges for captureNameString, otherwise NSNotFound is returned if there were no successful matches for any of the captures indexes of captureNameString.
Returns the capture index for captureNameString from a match operation, or the capture index of the first successful match for captureNameString if RKCompileDupNames is used and there are multiple instances of captureNameString in the receivers regular expression.
- (RKUInteger)captureIndexForCaptureName:(NSString * const restrict)captureNameString inMatchedRanges:(const NSRange * const restrict)matchedRanges error:(NSError **)error;
Discussion

This method is similar to captureIndexForCaptureName:inMatchedRanges: except that it optionally returns a NSError object for error conditions instead of throwing an exception. The error parameter may be set to nil if information about the error is not required.

Important:
Exceptions are still thrown for invalid argument conditions, such as passing nil for captureNameString or matchedRanges.
Returns a NSArray which maps the capture names in the receivers regular expression to their equivalent capture index values.
- (NSArray *)captureNameArray;
Discussion

If the regular expression of the receiver uses named subcaptures (ie, (?<year>(\d\d)?\d\d) ), then for each capture name there exists a corresponding capture index. A NSArray is created with captureCount elements and for every capture name the corresponding array index is set to a NSString of the capture name. If there is no capture name for an index, a NSNull is used instead.

This method returns nil if the receivers regular expression does not contain any named subcaptures.

Return Value
Returns a NSArray which maps the capture names in the receivers regular expression to their equivalent capture index values, or nil if the receivers regular expression does not contain any capture names.
Returns the capture name for the captured index.
- (NSString *)captureNameForCaptureIndex:(const RKUInteger)captureIndex;
Parameters
  • captureIndex
    The capture index of the desired capture name.
    Important:
    Raises a NSInvalidArgumentException if captureIndex is not valid for the receivers regular expression.
Return Value
Returns the capture name for captureIndex, otherwise nil if captureIndex does not have a name associated with it.
Returns the RKCompileOption options used to create the receiver.
- (RKCompileOption)compileOption;
Return Value
A mask of RKCompileOption flags combined with the C bitwise OR operator representing the options used in compiling the regular expression of the receiver.
Low level regular expression matching method.
- (RKMatchErrorCode)getRanges:(NSRange * const restrict)ranges withCharacters:(const void * const restrict)charactersBuffer length:(const RKUInteger)length inRange:(const NSRange)searchRange options:(const RKMatchOption)options;
Parameters
  • ranges
    Caller supplied pointer to an array of NSRanges at least captureCount big.
    Warning:
    Failure to provide a correctly sized ranges array will result in memory corruption.
  • charactersBuffer
    Pointer to the start of characters to search.
    Important:
    Raises a NSInvalidArgumentException if ranges or charactersBuffer is NULL.
  • length
    Length of charactersBuffer.
  • searchRange
    The range within charactersBuffer to match.
    Important:
    Raises a NSRangeException if length or searchRange is invalid or represents an invalid combination.
  • options
    A mask of options specified by combining RKMatchOption flags with the C bitwise OR operator.
Discussion

This method is the low level matching primitive to the PCRE library.

getRanges:withCharacters:length:inRange:options: allocates all of the memory needed to perform the regular expression matching and store any temporary results on the stack. The match results, if any, are translated from the PCRE library format to the equivalent NSRange format and stored in the caller supplied ranges NSRange array. For nearly all cases this means that there is no associated malloc() overhead involved. See rangesForCharacters:length:inRange:options:, which creates an autorelease buffer to store the results, if the caller is unable to provide a suitable buffer.

It is important to note that setting the searchRange.location and adding the equivalent offset to charactersBuffer are not the same thing. The value of charactersBuffer marks the hard start of the buffer, whereas a positive searchRange.location makes the characters from charactersBuffer up to searchRange.location available to the matching engine. This is an important distinction for some types of regular expressions, such as those that use lookbehind (ie, (?<=)), which may require examining characters that are strictly not within searchRange.

Return Value
Returns the number of captures matched (>0) on success, otherwise a RKMatchErrorCode (<0) on failure. The values in ranges are only modified on a successful match.
Returns a RKRegex object initialized with the regular expression regexString using the regular expression pattern matching library with RKCompileOption options.
- (id)initWithRegexString:(NSString * const restrict)regexString library:(NSString * const restrict)library options:(const RKCompileOption)libraryOptions error:(NSError **)error;
Parameters
  • regexString
    The regular expression to compile.
  • library
    The regular expression pattern matching library to use. See Regular Expression Libraries for a list of valid constants.
    Note:
    Currently the only supported regular expression matching library is the RKRegexPCRELibrary PCRE library.
  • libraryOptions
    A mask of options specified by combining RKCompileOption flags with the C bitwise OR operator.
  • error
    An optional parameter that if set and an error occurs, will contain a NSError object that describes the problem. This may be set to NULL if information about any errors is not required.
Discussion

Unlike initWithRegexString:options:, this method does not throw an exception on errors. Instead, a NSError object is created and returned via the optional error parameter.

Important:
Exceptions are still thrown for invalid argument conditions, such as passing nil for regexString or library.
Return Value
Returns a RKRegex object if successful, nil otherwise.
Returns a RKRegex object initialized with the regular expression regexString with RKCompileOption options.
- (id)initWithRegexString:(NSString * const restrict)regexString options:(const RKCompileOption)options;
Parameters
  • regexString
    The regular expression to compile.
    Important:
    Raises a NSInvalidArgumentException if regexString is nil.
  • options
    A mask of options specified by combining RKCompileOption flags with the C bitwise OR operator.
    Important:
    Raises a RKRegexSyntaxErrorException if regexString in combination with options is not a valid regular expression.
Discussion

Raises RKRegexSyntaxErrorException if regexString in combination with options is not a valid regular expression. The exception provides a userInfo dictionary containing the following keys and information:

Table 1 RKRegexSyntaxErrorException userInfo dictionary information.
Key Object Type Description
regexStringNSString The regexString regular expression that caused the exception.
regexStringErrorLocationNSNumber The location of the character that caused the syntax error.
regexAttributedStringNSAttributedString The regexString regular expression with a NSBackgroundColorAttributeName set to [NSColor redColor] for the character that caused the error along with the NSToolTipAttributeName attribute (if supported) set to errorString.
errorStringNSString The error string that the PCRE library returned.
RKCompileOptionNSNumber The RKCompileOption that was passed with regexString.
RKCompileOptionStringNSString A human readable C bitwise OR equivalent string of RKCompileOption options.
RKCompileOptionArrayNSArray The human readable equivalent of the individual C bitwise RKCompileOption options flags in a NSArray.
RKCompileErrorCodeNSNumber The RKCompileErrorCode that the PCRE library returned.
RKCompileErrorCodeStringNSString A human readable equivalent of the RKCompileErrorCode name that the PCRE library returned.

Currently creates a regular expression using the RKRegexPCRELibrary PCRE library.

Return Value
Returns a RKRegex object if successful, nil otherwise.
Returns a Boolean value that indicates whether captureNameString is a valid capture name for the receiver.
- (BOOL)isValidCaptureName:(NSString * const)captureNameString;
Parameters
  • captureNameString
    A NSString of the name of the desired capture index.
Returns a Boolean value that indicates whether matchCharacters of length in searchRange with options is matched by the receiver.
- (BOOL)matchesCharacters:(const void * const restrict)matchCharacters length:(const RKUInteger)length inRange:(const NSRange)searchRange options:(const RKMatchOption)options;
Parameters
  • matchCharacters
    The characters to match against. This value must not be NULL.
    Important:
    Raises a NSInvalidArgumentException if matchCharacters is NULL.
  • length
    The number of characters in matchCharacters.
  • searchRange
    The range within matchCharacters to match against.
    Important:
    Raises a NSRangeException if any part of searchRange lies beyond the end of matchCharacters.
  • options
    A mask of options specified by combining RKMatchOption flags with the C bitwise OR operator.
Discussion
Invokes rangeForCharacters:length:inRange:captureIndex:options: for captureIndex of 0 with the specified parameters and returns NO if the result is NSNotFound, YES otherwise.
Return Value
YES if the receiver matches matchCharacters of length length within searchRange with options, otherwise NO.
Returns the range of captureIndex for the first match in matchCharacters of length length inside searchRange with options matched by the receiver.
- (NSRange)rangeForCharacters:(const void * const restrict)matchCharacters length:(const RKUInteger)length inRange:(const NSRange)searchRange captureIndex:(const RKUInteger)captureIndex options:(const RKMatchOption)options;
Parameters
  • matchCharacters
    The characters to match against. This value must not be NULL.
    Important:
    Raises a NSInvalidArgumentException if matchCharacters is NULL.
  • length
    The number of characters in matchCharacters.
  • searchRange
    The range within matchString to match against.
    Important:
    Raises a NSRangeException if any part of searchRange lies beyond the end of matchString.
  • captureIndex
    The range of the match for the capture subpattern captureIndex of the receivers regular expression to return.
  • options
    A mask of options specified by combining RKMatchOption flags with the C bitwise OR operator.
Discussion
(comprehensive description)
Important:
Raises a NSInvalidArgumentException if captureIndex is not valid for the receivers regular expression.
Return Value
A NSRange structure giving the location and length of captureIndex for the first match in matchCharacters of length length inside searchRange with options that is matched by the receiver. Returns {NSNotFound, 0} if the receiver does not match matchCharacters.
Returns a pointer to an array of NSRange structures that correspond to the capture indexes of the receiver for the first match in matchCharacters of length length in searchRange with options.
- (NSRange *)rangesForCharacters:(const void * const restrict)matchCharacters length:(const RKUInteger)length inRange:(const NSRange)searchRange options:(const RKMatchOption)options;
Parameters
  • matchCharacters
    The characters to match against. This value must not be NULL.
    Important:
    Raises a NSInvalidArgumentException if matchCharacters is NULL.
  • length
    The number of characters in matchCharacters.
  • searchRange
    The range within matchString to match against.
    Important:
    Raises a NSRangeException if any part of searchRange lies beyond the end of matchString.
  • options
    A mask of options specified by combining RKMatchOption flags with the C bitwise OR operator.
Discussion

The returned pointer of an array of captureCount NSRange structures is automatically freed just as a autoreleased object would be released; you should copy any values that are required past the autorelease context in which they were created.

There is no need to free() the returned result as it will automatically be deallocated at the end of the current NSAutoreleasePool context.

Example code

// Assumes that regexObject and characters exists NSRange *captureRanges = NULL; captureRanges = [regexObject rangesForCharacters:characters length:strlen(characters) inRange:NSMakeRange(0, strlen(characters)) options:RKMatchNoOptions]; if(captureRanges != NULL) { int x; for(x = 0; x < [regexObject captureCount]; x++) { NSLog(@"Capture index %d location %u, length %u", x, captureRanges[x].location, captureRanges[x].length); NSLog(@"NSRange string %@", NSStringFromRange(captureRanges[x])); } }
Return Value

A pointer to an autoreleased allocation of memory that is sizeof(NSRange) * [self captureCount] bytes long and contains captureCount NSRange structures with the location and length for the capture indexes of the first match in matchCharacters of length length within the range searchRange using options.

Returns NULL if the receiver does not match matchCharacters using the supplied arguments.

Returns the regular expression used to create the receiver.
- (NSString *)regexString;
 
RegexKit project hosted by: