RegexKit Programming Guide

An Objective-C Framework for Regular Expressions using the PCRE Library

Introduction

This document introduces the RegexKit.framework for the Objective-C language and demonstrates how to use regular expressions in your project. The RegexKit.framework enables easy access to regular expressions by providing a number of additions to standard Foundation classes, such as NSArray, NSDictionary, NSSet, and NSString, along with their mutable variants. The RegexKit.framework acts as a bridge between the Foundation classes and the PCRE (Perl Compatible Regular Expression) library, available at www.pcre.org.

Highlights

Multithreading safe.
Automatically caches compiled regular expressions.
For Mac OS X, the framework is built as a Universal Binary.
Uses Core Foundation on Mac OS X for greater speed.
PCRE library built in, no need to build or install separately.
GNUstep support.

Prerequisites

An Objective-C development environment.
OpenStep Foundation compatible framework, such as Mac OS X Cocoa or GNUstep.
For Mac OS X, 10.4 or greater is required.
Some experience with regular expressions.

Documentation Overview

Regular Expressions
The RegexKit Classes
NSString Additions
DTrace Probe Points in RegexKit
Adding the RegexKit.framework to your Project
License Information

Regular Expressions

Regular expressions can be quite complex. This section is in no way meant to be a comprehensive overview of regular expressions, it is only a pragmatic introduction to regular expressions highlighting some of the features than can be put to immediate use. Since this framework uses the PCRE library to perform the actual regular expression matching, you should familiarize yourself with the specifics of the PCRE Regular Expression Syntax.

Important:

The C language assigns special meaning to the \ character when inside a quoted " " string in your source code. The \ character is the escape character, and the character that follows has a different meaning than normal. The most common example of this is \n which translates in to the New Line character. Because of this, you are required to 'escape' any uses of \ by prepending it with another \. In practical terms, this means doubling any \ in a regular expression, which unfortunately is quite common, that are inside of quoted " " strings in your source code. Failure to do so will result in numerous warnings from the compiler about unknown escape sequences.

Characters and Metacharacters
Pattern	Description
.	Match any character except New Line
\	Escape the next character
^	Match the beginning of a line
$	Match the end of a line
\|	Alternative
( )	Capture subpattern grouping
[ ]	Character class

Generic Character Types
Pattern	Description
\d	Any decimal digit
\D	Any character that is not a decimal digit
\s	Any whitespace character
\S	Any character that is not a whitespace character
\w	Any word character
\W	Any non-word character

Common Quantifiers
Pattern	Description
*	Match 0 or more times
+	Match 1 or more times
?	Match 1 or 0 times
{n}	Match exactly n times
{n,}	Match at least n times
{n,m}	Match at least n but not more than m times

The Basics

One of the basic primitives of regular expressions is the pair What to match and How many times to match, or Quantity. Some what to match sequences are so common, such as the whitespace characters, or any alphanumeric character except the whitespace characters, that they have special short hand notation in regular expressions. See the table Generic character types for some of the more common shorthands.

As an example, suppose you wanted to match a number of the form 123.45 A very simple pattern that would match this \d+\.\d+ Note that the decimal point is escaped in the regular expression with a \ as we would like to match the character . and not its normal regular expression meaning which is match any character. Without the escape, the regular expression would match 123z45, which is clearly not what we want.

Extracting Part of a Match

Often we're not interested in whether or not a regular expression matches necessarily, but we are interested in certain parts of what matched. From the previous example, suppose we were interested in the number before the decimal point, and the number after it. To do that, we use another regular expression feature called subpatterns.

Capture Subpatterns

Subpatterns are specified in a regular expression with a pair of parenthesis, ( ) and have the following syntax:

(pattern)

pattern
The regular expression pattern to match.

(\d+)\.(\d+)

To update our previous example, the regular expression becomes (\d+)\.(\d+) The regular expression engine then provides the range of matching characters that correspond to a subpattern, which is called a capture.

Captures are numbered sequentially, beginning at zero, and then in the order of appearance in the regular expression. Capture 0 (zero) has a special meaning which is the entire range of characters that the regular expression matched. In our example, capture 1 (one) corresponds to the number before decimal point, and capture 2 (two) corresponds to the number after the decimal point.

Named Capture Subpatterns

Complex regular expressions might contain a number of capture subpatterns, and keeping track of the correct capture subpattern can be error prone. To make things easier, PCRE provides syntax to name a subpattern, which is:

(?<name>pattern)

name
The optional name to give the capture subpattern.
pattern
The regular expression pattern to match.

(?<total>\d+\.\d+)

Nested Capture Subpatterns

Capture subpatterns can be nested to an arbitrary depth as well:

(?<name>pattern (?<name>pattern) )

name
The optional name to give the capture subpattern.
pattern
The regular expression pattern to match.

(?<total>(?<dollars>\d+)\.(?<cents>\d+))

In the example given, the capture name dollars would include the digits up to the decimal point, cents would include the digits after the decimal point, and total would include the decimal point along with the digits before and after the decimal point.

It Only Gets More Complicated from Here

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. Jamie Zawinski

Past this point things start to get more complicated quickly. Using features that are obviously not covered here, it is possible to craft a single regular expression that is capable of matching a wide range of "numbers", from the most basic representation of just numeric digits, to optionally accepting digits after the decimal point, but always requiring at least one digit before the decimal point (ie, .234 would not be valid), with optional scientific exponent tacked on the end. For details on these more advanced features you will need to read the PCRE Regular Expression Syntax documentation.

If you're new to regular expressions, hopefully the three basic points (what, how much, which part) covered here are enough to be useful to you. Regular expressions make most text processing tasks much easier. A common need is to strip any leading or trailing whitespace in a string and a regular expression like \s*(.*\S+)\s* will do just that. Nearly any task that requires a NSScanner can be done faster and with less code with a regular expression.

However, regular expressions are not the solution to every pattern matching problem. Regular expressions tend to be very difficult for other programmers to read and understand, let alone maintain, and are often described as "write only." Even after careful review, their can be spectacularly large, but hidden, differences between what a regular expression actually matches and what its author intended. This can have enormous repercussions in some usage scenarios, such as depending on a regular expression for validation in a security context. As with all tools, make sure it's the right one for the problem you're trying to solve.

PCRE Regular Expression Syntax
PCRE Compatibility with Perl
NSString RegexKit Additions
The Perl Language Regular Expressions
Jeffrey Friedl's Mastering Regular Expressions
Wikipedia Regular expression

The RegexKit Classes

RKRegex

The RKRegex class forms the core of the RegexKit.framework. It provides basic primitives for performing regular expression matches on raw byte buffers and obtaining the results of those matches in the form of NSRange structures. It also provides methods for translating the name of a capture subpattern to its equivalent capture index number.

While you can create RKRegex objects directly, all of the extensions to the Foundation classes accept either an instantiated RKRegex object or a NSString with a regular expression pattern which will automatically be converted to a RKRegex object for you. Usually you will only need to manually instantiate RKRegex objects when you require an unusual option to be set that can't be altered from within the regular expression pattern itself.

There are two methods used in the creation of RKRegex objects:

RKRegex Instantiation Methods
Method	Description
initWithRegexString:options:	Designated initializer. Primary means of creating RKRegex objects.
regexWithRegexString:options:	Convenience method that allocates, initializes, and returns an autoreleased RKRegex object.

For each regular expression match, a C style array of NSRange results is created. The index value of the NSRange array corresponds to the matching regular expression subcapture index value. Index 0 (zero) is always created and represents the entire range that the regular expression matched. Subsequent index values, up to the total capture indexes in the regular expression, represent the match range for the equivalent subcapture.

Important:

The following methods return only the first matching result for the arguments given. Additional matches, if any, require another call starting at the end of the last match.

RKRegex Matching Primitives
Method	Result	Description
getRanges: withCharacters: length: inRange: options:	RKMatchErrorCode	Copies an array of NSRange structures to a user supplied buffer that is at least [aRegex captureCount] big.
matchesCharacters: length: inRange: options:	BOOL	A boolean YES or NO depending on whether or not the regular expression matched the string.
rangeForCharacters: length: inRange: captureIndex: options:	NSRange	The NSRange for the captureIndex: subcapture
rangesForCharacters: length: inRange: options:	NSRange *	An autoreleased block of memory containing a C array of NSRange structures, one for each capture. Accessed via NSRange[0] through NSRange[Max Subcapture].

Since these methods work on raw byte buffers only, and only return the range(s) of a match, they generally aren't used by end user programs directly. The RegexKit.framework provides a number of extensions to common Foundation classes, such as NSString, that are much easier to use than the raw results provided by the RKRegex class.

Note:

The getRanges:withCharacters:length:inRange:options: method is the analog of the PCRE libraries pcre_exec function. The primary difference is getRanges:withCharacters:length:inRange:options: automatically sizes a buffer from the stack to hold the temporary results from pcre_exec, then while copying to the user supplied buffer, converts those results to their equivalent NSRange results.

RKCache

The RKCache class provides the frameworks caching functionality. It provides a multithreading safe means of caching and retrieving immutable objects, notably RKRegex objects. The RKRegex class makes heavy use of the RKCache class to improve the performance of the framework.

Important:

The RKCache class is not meant to be used by end user programs.

Regular Expression Caching

The PCRE library requires that the text form of a regular expression be parsed and compiled to an internal form that is usable by the matching routines. While this is a relatively quick operation, it is not instantaneous. The RegexKit.framework takes advantage of the fact that once a regular expression has been compiled, the compiled form is immutable and can be reused again and again.

The RegexKit.framework maintains a global RKCache of instantiated RKRegex objects. When a RKRegex is allocated and sent a initWithRegexString:option: message, it first checks the global cache. If the cache contains a match, the cached result is returned instead. Otherwise a new RKRegex is created and added to the global cache. This allows the RegexKit.framework to convert a regular expression pattern in NSString form in to a RKRegex very quickly.

Caching happens automatically, and is fully multithreading safe. Information about the cache is available by invoking the status method. For example:

NSString *cacheStatus = [[RKRegex cache] status];

// Example cacheStatus:
// @"Enabled = Yes, Cleared count = 0, Cache count = 27, Hit rate = 96.27%, Hits = 697, Misses = 27, Total = 724";

The Cache in Action

A common usage scenario is to apply the same regular expression to every line in a text file. The cache removes the need to anticipate when it makes sense to create a RKRegex object once and reuse it, or just make use of the convenience methods which would normally recreate the same regular expression for each invocation. This also results in less clutter in your code.

The following example iterates over all the items in stringArray and strips any leading or trailing whitespace the original string had and skips strings that are empty or contain only whitespace characters. Since the cache eliminates the need to explicitly manage RKRegex objects, the following programming style can be used without a performance penalty:

/* Backslashes, '\', need to be escaped with a backslash inside of C strings. */

NSEnumerator *stringEnumerator = [stringArray objectEnumerator]; /* Assumes that stringArray exists. */
NSString *atString = NULL;

while((atString = [stringEnumerator nextObject]) != NULL) {
  NSString *cleanString = NULL; /* Will contain the result from the capture subpattern extraction. */

  /* Empty or whitespace only strings do not match and are skipped. */
  if([atString getCapturesWithRegexAndReferences:@"\\s*(.*\\S+)\\s*", @"$1", &cleanString, nil] == NO) { continue; }

  // cleanString now contains a pointer to a new autoreleased NSString that
  // contains atString without any leading or trailing whitespace.
}

RegexKit Compared to NSScanner

In many cases creating a regular expression object ahead of time to use repeatedly is simply not feasible. Typically the details required to create a long lived regular expression object to use in repeated matchings are not available to the caller due to object abstraction. The following implements the NSScanner matching example from String Programming Guide for Cocoa. It is significantly more compact than the 27 lines used to implement the same functionality with NSScanner and is dramatically faster as well. Once the initial regular expression object has been cached there is virtually no overhead except for the actual matching itself. The underlying methods used by isMatchedByRegex: do not create any additional objects or memory allocations. Only stack space is required to determine if there was a successful match or not. No special steps are required by the caller of scanProductString: or by its implementor to achieve the best performance due to automatic caching.

/* Example string to match: @"Product: Acme Potato Peeler; Cost: 0.98" */

- (BOOL)scanProductString:(NSString *string
{
  return([string isMatchedByRegex:@"Product: .+; Cost: \\d+\\.\\d+"]);
}

RKEnumerator

The RKEnumerator class provides the means to enumerate all the matches of a regular expression in a NSString.

In addition to basic enumeration, the RKEnumerator class provides a number of methods to extract, convert, and format the currently enumerated match that are similar to the various NSString additions. These include methods such as getCapturesWithReferences:, stringWithReferenceString:, and stringWithReferenceFormat:.

Foundation Extensions

In addition to the above, the RegexKit.framework makes a number of Objective-C category additions to the NSArray, NSDictionary, NSSet, and NSString Foundation classes. These additions are the primary means of using the RegexKit.framework.

NSString Additions

Since regular expressions are often involved in text manipulation tasks, the NSString RegexKit Additions are covered in their own section, NSString Additions.

NSArray, NSDictionary, and NSSet Additions

The remaining RegexKit additions are to the Foundation Collection class objects, NSArray, NSDictionary, and NSSet. Most of the additions are essentially the same and involve either querying a collection to determine if it contains an object that is matched by a regular expression, obtaining the count of objects matched by a regular expression, or creating a new collection from the objects matched by a regular expression.

The NSDictionary RegexKit Additions allow you to query on either keys matched by a regular expression, or objects matched by a regular expression. In both cases, a NSArray of either the matching keys or objects can be returned, or a new NSDictionary that contains the results from the regular expression match.

Along with the immutable collection classes, the mutable collection classes receive a number of additions as well. These are mostly convenience methods, allowing you to remove items from a collection based on whether or not a item is matched by a regular expression, or allowing you to add items matched by a regular expression from a second collection.

NSString Additions

Capture Extraction

Extracting the text of a capture subpattern is done by sending getCapturesWithRegexAndReferences: to a member of the NSString class. The first argument is the regular expression that you would like to match, followed by a nil terminated, variable length list of key and pointer to a pointer arguments.

[aString getCapturesWithRegexAndReferences:aRegex, key, pointer to a pointer, ..., nil]

aString
The NSString to search with aRegex.
aRegex
A regular expression as either a NSString or an instantiated RKRegex object.
key
The Capture Subpattern Reference for the text you wish to extract, similar to perls $n syntax.
pointer to a pointer
A pointer to a NSString pointer where the result of key will be stored.
...
An optional list of additional key / pointer to a pointer pairs.
nil
The required nil terminator.

[@"You owe: 1234.56 (tip not included)" getCapturesWithRegexAndReferences:@"(\\d+\\.\\d+)", @"$1", &extractedString, nil];

The complete example:

NSString *extractedString = NULL;

[@"You owe: 1234.56 (tip not included)" getCapturesWithRegexAndReferences:@"(\\d+\\.\\d+)",
                                                                          @"$1", &extractedString,
                                                                          nil];
                                                                    
// extractedString = @"1234.56";

After executing, extractedString will contain a pointer to a newly created, autoreleased NSString containing the requested matching text. In the previous example, extractedString will point to a string that is equivalent to @"1234.56";

The following example demonstrates extracting multiple strings at once with both numbered and named capture references. Note that capture number zero refers to entire text that the regular expression matched.

NSString *entireMatchString = NULL, *totalString = NULL, *dollarsString = NULL, *centsString = NULL;
NSString *regexString = @"owe:\\s*\\$?(?<total>(?<dollars>\\d+)\\.(?<cents>\\d+))";

[@"You owe: 1234.56 (tip not included)" getCapturesWithRegexAndReferences:regexString,
                                                                          @"$0", &entireString,
                                                                          @"${total}", &totalString,
                                                                          @"${dollars}", &dollarsString,
                                                                          @"${cents}", &centsString,
                                                                          nil];
                                                                    
// entireString  = @"owe: 1234.56";
// totalString   = @"1234.56";
// dollarsString = @"1234";
// centsString   = @"56";

Capture Type Conversions

The RegexKit.framework NSString class additions also provide the means to automatically convert the captured text in to a number of formats. This includes primitive C types and Objective-C types such as NSNumber and NSDate. Primitive type conversion is handled by the systems scanf function and therefore makes use of the same conversion syntax and specifiers that scanf does. If you are unfamiliar with the syntax, it is generally the same percent-style used for converting primitive types in to string form. For example, "%d" would convert an ASCII string form of "4259" in to an int with the value of 4259.

In the following example, the text of the hex color value is matched and returned as a NSString.

NSString *capturedColor = NULL;

[@"Hex color 0x8f239aff is the best!" getCapturesWithRegexAndReferences:@"Hex color (0x\\w+\\b)", @"$1", &capturedColor, nil];

// capturedColor = @"0x8f239aff";

While useful, additional operations on the value represented by the text are much easier if converted in to a primitive type, such as an unsigned int. This is accomplished by specifying the conversion type desired with the capture reference. To avoid ambiguity the capture reference must contain a pair of curly braces ('{' and '}'), which contain both the capture subpattern reference and the conversion specification separated by a colon (':') character. For example, ${1:%x} refers to capture subpattern number one and specifies %x, or hexadecimal to unsigned int, for the conversion.

As an example, the following matches the text for the hex color in string form, converts it to an unsigned int, and stores the result in hexColor.

unsigned int hexColor = 0x0;

[@"Hex color 0x8f239aff is the best!" getCapturesWithRegexAndReferences:@"Hex color (0x\\w+\\b)", @"${1:%x}", &hexColor, nil];

// hexColor = 0x8f239aff;

This same simple conversion, without the help of regular expressions and automatic type conversion, typically spans multiple lines. First, one has to scan the subject string and find the range of interest. Once found, that text is usually copied to a temporary buffer. Finally, the appropriate string to value conversion function is called. With the ability to perform type conversions as part of the matching process, getCapturesWithRegexAndReferences: makes quick work of what was once a tedious and error prone process.

In addition to converting matched text to basic C data types, you can also convert matched text to NSNumber and NSCalendarDate objects. The following example demonstrates the conversion to an NSNumber using the NSNumberFormatterSpellOutStyle number format style.

NSString *subjectString = @"He said the speed was 'one hundred and five point three two'.";
NSNumber *convertedNumber = NULL;

[subjectString getCapturesWithRegexAndReferences:@"'([^\\']*)'", @"${1:@wn}", &convertedNumber, nil];

// [convertedNumber doubleValue] = 105.32;

The @d type conversion can parse a wide range of date formats, returning a NSCalendarDate object:

NSString *subjectString = @"Current date and time: 6/20/2007, 11:34PM EDT.";
NSCalendarDate *convertedDate = NULL;

[subjectString getCapturesWithRegexAndReferences:@":\\s*(?<date>.*)\\.", @"${date:@d}", &convertedDate, nil];
NSLog(@"Converted date = %@\n", convertedDate);

// NSLog output: Converted date = 2007-06-20 23:34:00 -0400

Determining if a String is Matched by a Regular Expression

The method isMatchedByRegex: can be used to check if a string is matched, or not matched, by a regular expression.

BOOL didMatch           = NO;
NSString *subjectString = @"Only the first match of 'first' is matched";

didMatch = [subjectString isMatchedByRegex:@"first"];

// didMatch = YES

Or, as an example of a regular expression not matching a string:

BOOL didMatch           = YES;
NSString *subjectString = @"Only the first match of 'first' is matched";

didMatch = [subjectString isMatchedByRegex:@"second"];

// didMatch = NO

The method isMatchedByRegex:inRange: can be used to alter the range of the string to check for a match as well:

BOOL didMatch           = YES;
NSString *subjectString = @"Only the first match of 'first' is matched";

didMatch = [subjectString isMatchedByRegex:@"first" inRange:NSMakeRange(0, 12)];

// didMatch = NO

Even though the range of the string specified, Only the fir, contains the first few characters of the regular expression to match, the result is NO since the entire regular expression, first, is not matched.

Finding the Range of a Match

To find the the entire range that a regular expression matches in a string, you can use the rangeOfRegex: method. For example:

NSRange matchRange      = NSMakeRange(NSNotFound, 0);
NSString *subjectString = @"Only the first match of 'first' is matched";

matchRange = [subjectString rangeOfRegex:@"first"];

// matchRange = {9, 5} == "first"

Or, if the string is not matched by the regular expression, the range {NSNotFound, 0} is returned.

NSRange matchRange      = NSMakeRange(NSNotFound, 0);
NSString *subjectString = @"Only the first match of 'first' is matched";

matchRange = [subjectString rangeOfRegex:@"second"];

// matchRange = {NSNotFound, 0}

As the first example demonstrates, only the result of the first match of a regular expression is returned. Additional results would require invoking rangeOfRegex:inRange:capture: with a capture of 0 with a range that begins at the end of the last match. For example:

NSRange matchRange      = NSMakeRange(NSNotFound, 0);
NSString *subjectString = @"Only the first match of 'first' is matched";

matchRange = [subjectString rangeOfRegex:@"first" inRange:NSMakeRange(9 + 5, [subjectString length] - (9 + 5)) capture:0];

// matchRange = {25, 5} == "first"

rangeOfRegex:inRange:capture: can also be used to find the range of a capture subpattern as well:

NSRange matchRange      = NSMakeRange(NSNotFound, 0);
NSString *subjectString = @"Only the first match of 'first' is matched";

matchRange = [subjectString rangeOfRegex:@"'?(first)'?\\s*(\\S+)" inRange:NSMakeRange(0, [subjectString length]) capture:2];

// matchRange = {15, 5} == "match"

To obtain ranges for all the capture subpatterns of a regular expression match, you can use the method rangesOfRegex: and it's companion rangesOfRegex:inRange:. These methods will return a pointer to an array NSRange structures. The pointer returned is to a block of autoreleased memory that is sizeof(NSRange) * [regex captureCount] bytes big. The memory containing the range results will be released, and therefore invalid, once the current NSAutoreleasePool is released. You should not keep a pointer to the returned buffer past this point, and you should copy any results that you require past the current NSAutoreleasePool context. As an example:

NSRange *matchRanges    = NULL;
NSString *subjectString = @"Only the first match of 'first' is matched";

matchRanges = [subjectString rangesOfRegex:@"'?(first)'?\\s*(\\S+)"];

// matchRanges[0] = {9, 5} == "first"
// matchRanges[1] = {15, 5} == "match"

Creating a New String Using the Results of a Match

RegexKit provides a number of methods that allow you to easily create new strings that include the text from a regular expression match, similar to the way that perl allows you to access capture subpatterns with the variables $number.

The method stringByMatching:withReferenceString: allows you to create a new, temporary string that replaces any capture references with the text of the matched text. For example:

NSString *newString      = NULL;
NSString *subjectString  = @"Amount due: 149.23";
NSString *regexString    = @"Amount due: (\\d+\\.\\d+)";
NSString *templateString = @"You owe: $1 (does not include tax)";

newString = [subjectString stringByMatching:regexString withReferenceString:templateString];

// newString = @"You owe: 149.23 (does not include tax)";

Strings can also be created with a combination of match results and format specification argument replacement:

NSString *newString      = NULL;
NSString *subjectString  = @"Amount due: 149.23";
NSString *regexString    = @"Amount due: (\\d+\\.\\d+)";
NSString *templateString = @"[%d of %d] You owe: $1 (does not include tax)";

newString = [subjectString stringByMatching:regexString withReferenceFormat:templateString, 1, 5];

// newString = @"[1 of 5] You owe: 149.23 (does not include tax)";

The methods stringByMatching:inRange:withReferenceString: and stringByMatching:inRange:withReferenceFormat: are also available which allow you to work on sub-ranges of strings.

Search and Replace

In addition to creating new strings from the results of a match, the NSString RegexKit additions also provide the means to replace the matched range with the text of a new string. The replacement string may include references to text matched by the regular expression as well. The search and replace methods allow you to specify the number of times that the regular expression can match and replace the receivers text. A special constant, RKReplaceAll, is used to specify that all the matches in the receiver should be replaced. For example:

NSString *newString         = NULL;
NSString *subjectString     = @"Amount due: 149.23";
NSString *regexString       = @"(\\d+\\.\\d+)";
NSString *replacementString = @"--> $1 <-- (does not include tax)";

newString = [subjectString stringByMatching:regexString replace:RKReplaceAll withString:replacementString];

// newString = @"Amount due: --> 149.23 <-- (does not include tax)";

An example demonstrating multiple replacements:

NSString *newString         = NULL;
NSString *subjectString     = @"149.23, 151.29, 157.31";
NSString *regexString       = @"(\\d+\\.\\d+)";
NSString *replacementString = @"-($1)";

newString = [subjectString stringByMatching:regexString replace:RKReplaceAll withString:replacementString];

// newString = @"-(149.23), -(151.29), -(157.31)";

The same example, but replacing only the first two matches:

NSString *newString         = NULL;
NSString *subjectString     = @"149.23, 151.29, 157.31";
NSString *regexString       = @"(\\d+\\.\\d+)";
NSString *replacementString = @"-($1)";

newString = [subjectString stringByMatching:regexString replace:2 withString:replacementString];

// newString = @"-(149.23), -(151.29), 157.31";

An example of a more restrictive regular expression:

NSString *newString         = NULL;
NSString *subjectString     = @"149.23, 151.29, 157.31, 1511.29";
NSString *regexString       = @"(\\d{3}\\.29)";
NSString *replacementString = @"-($1)";

newString = [subjectString stringByMatching:regexString replace:RKReplaceAll withString:replacementString];

// newString = @"149.23, -(151.29), 157.31, 1511.29";

Enumerating all the Matches in a String by a Regular Expression

With the RKEnumerator class, you can enumerate all of the matches of a regular expression in a string the same way you might enumerate all the objects in a NSArray with a NSEnumerator. Unlike the NSEnumerator class, however, the RKEnumerator class provides a number of additional methods for accessing the details of the currently enumerated match. Many of the additional methods have analogs to the NSString RegexKit additions, such as stringWithReferenceFormat:, which allows you to create a new, temporary string with references to the currently enumerated match.

The RKEnumerator class provides a number of next... methods to advance to the next match. Which one to use depends on what you will use the match results for. The method nextRanges is the fastest and has the least internal overhead since it only updates it's private buffer with the information of the next match, if any. nextObject is the slowest, as it creates a NSArray of NSValue objects containing the ranges of all the capture subpatterns.

Here are some examples demonstrating the use of RKEnumerator:

NSString *subjectString = @"149.23, 151.29, 157.31, 1511.29";
NSString *regexString   = @"(\\d+\\.\\d+)";

RKEnumerator *matchEnumerator = [subjectString matchEnumeratorWithRegex:regexString];

while([matchEnumerator nextRanges] != NULL) {
  NSLog(@"Range of match: %@", NSStringFromRange([matchEnumerator currentRange]));
}

// Outputs:
// Range of match: {0, 6}
// Range of match: {8, 6}
// Range of match: {16, 6}
// Range of match: {24, 7}

The same example, but converting the current match to a double:

NSString *subjectString = @"149.23, 151.29, 157.31, 1511.29";
NSString *regexString   = @"(\\d+\\.\\d+)";

RKEnumerator *matchEnumerator = [subjectString matchEnumeratorWithRegex:regexString];

while([matchEnumerator nextRanges] != NULL) {
  double enumeratedDouble = 0.0;
  [matchEnumerator getCapturesWithReferences:@"${1:%lf}", &enumeratedDouble, nil];
  NSLog(@"Enumerated: %.2f", enumeratedDouble);
}

// Outputs:
// Enumerated: 149.23
// Enumerated: 151.29
// Enumerated: 157.31
// Enumerated: 1511.29

An example using the stringWithReferenceFormat: method:

NSString *subjectString = @"149.23, 151.29, 157.31, 1511.29";
NSString *regexString   = @"(\\d+\\.\\d+)";
int matchNumber = 1;

RKEnumerator *matchEnumerator = [subjectString matchEnumeratorWithRegex:regexString];

while([matchEnumerator nextRanges] != NULL) {
  double enumeratedDouble = 0.0;
  NSString *newString = NULL;

  [matchEnumerator getCapturesWithReferences:@"${1:%lf}", &enumeratedDouble, nil];
  newString = [matchEnumerator stringWithReferenceFormat:@"#%d: %.2f", matchNumber, enumeratedDouble * 10.0];
  NSLog(@"String: %@", newString);
  matchNumber++;
}

// Outputs:
// String: #1: 1492.30
// String: #2: 1512.90
// String: #3: 1573.10
// String: #4: 15112.90

DTrace Probe Points in RegexKit

Mac OS X 10.5 Leopard contains a powerful new debugging facility called DTrace. DTrace, originally developed by Sun Microsystems for the Solaris Operating System, is an open source kernel level framework for dynamically instrumenting live systems. While there have been numerous tools in the past to record program execution trace information, such as truss or ktrace, none have been as comprehensive as DTrace. For example, DTrace allows you to trace the entry and exit of any function, in any program, and record the arguments on entry and the results on exit. Mac OS X 10.5 provides an Objective-C DTrace provider that extends that functionality to Objective-C class and instance methods as well. Of course, DTrace also provides extensive kernel level tracing as well, including user to kernel crossings such as syscall.

DTrace also allows applications to define custom probe points that DTrace can attach to. This allows, for example, shared library developers to offer tailored probe points for specific information, instead of trying to recreate the information by tracking individual calls in to the library. RegexKit makes use of this functionality to provide a number of RegexKit specific DTrace probe points.

The following simply documents the enhanced DTrace functionality that RegexKit provides. It is not a guide to the DTrace facility. A pre-requisite to making the most of DTrace is the Solaris Dynamic Tracing Guide (as .PDF). This is a must read for anyone who wishes to make the most of DTrace and the information outlined in the following sections.

Important:

You are strongly encouraged to familiarize yourself with the Solaris Dynamic Tracing Guide (as .PDF), not just for use with RegexKit, but in general. DTrace is a powerful, general purpose debugging tool that every Mac OS X programmer should be familiar with.

RegexKit Provider DTrace Probe Points

The following table provides a list of the available RegexKit probe points. Following the table, the details of each probe are provided, including the number of arguments, argument types, and description of each argument.

RegexKit Provider DTrace Probe Points
Probe Point	Description
RegexKit:::PerformanceNote	Fires for potential performance problems.
RegexKit:::BeginRegexCompile	Fires at the start of compiling a regular expression.
RegexKit:::EndRegexCompile	Fires at the end of compiling a regular expression.
RegexKit:::MatchException	Fires when matching results in an exception.
RegexKit:::BeginMatch	Fires at the start of a match.
RegexKit:::EndMatch	Fires at the end of a match.
RegexKit:::CacheCleared	Fires when the regular expression cache is cleared.
RegexKit:::BeginCacheLookup	Fires at the start of a cache lookup.
RegexKit:::EndCacheLookup	Fires when a cache lookup completes.
RegexKit:::BeginCacheAdd	Fires at the start of adding an object to the cache.
RegexKit:::EndCacheAdd	Fires at the end of adding an object to the cache.
RegexKit:::BeginCacheRemove	Fires at the start of removing an object from the cache.
RegexKit:::EndCacheRemove	Fires at the end of removing an object from the cache.
RegexKit:::BeginLock	Fires at the start of an attempt to acquire a lock.
RegexKit:::EndLock	Fires once a lock has been acquired.
RegexKit:::Unlock	Fires when a previously acquired lock is released and unlocked.

Performance Notification Related Probes

The following probe fires when the framework has detected a potential performance impacting condition.

PerformanceNote(void *object, NSUInteger hash, char *description, NSUInteger size,
                int impact, int noteType, char *note);

Performance Note Arguments Description
Name	Argument	Description
object	arg0	If applicable, contains a pointer to the relevant object.
hash	arg1	The hash value for object.
description	arg2	The description for object.
size	arg3	If applicable, contains the size value related to the performance note.
impact	arg4	Conditions which negatively impact performance will have a value < 0. Conditions which positively impact performance will have a value > 0.
noteType	arg5	For a general performance notes, noteType is 0. For performance notes that can be timed, noteType is 1 to indicate the start, and 2 to indicate the end.
note	arg6	A pointer to a NULL terminated string containing a description of the performance condition.

Regular Expression Related Probes

The following probes fire at the start and end of the compiling of a regular expression in to an internal format.

BeginRegexCompile(void *regex, NSUInteger hash, char *regexCharacters, int compileOption);

EndRegexCompile(  void *regex, NSUInteger hash, char *regexCharacters, int compileOption,
                  int errorCode, char *errorCodeCharacters, char *pcreErrorCharacters,
                  int errorAtOffsetOfRegexCharacters);

Regular Expression Compile Arguments Description
Name	Argument	Description
regex	arg0	A pointer to the RKRegex object.
hash	arg1	The computed hash value for the RKRegex object.
regexCharacters	arg2	A pointer to a NULL terminated string containing the text of the regular expression.
compileOption	arg3	The RKCompileOption options for the regular expression.
errorCode	arg4	Contains the RKCompileErrorCode error code.
errorCodeCharacters	arg5	A pointer to a NULL terminated string of the text of the RKCompileErrorCode error code.
pcreErrorCharacters	arg6	If there was an error compiling the regular expression, this is set to a pointer to a NULL terminated string from the pcre library describing the error.
errorAtOffsetOfRegexCharacters	arg7	If there was an error compiling the regular expression, this contains the location of the first character in regexCharacters that caused the error.

The regexProbeObject type is used in the following match methods as there are only ten arguments, arg0 through arg9, available to probes. Although the use of regexProbeObject results in some inconvenience, it does allow for more information to be passed by the probe than would otherwise normally be available.

typedef struct {
  void *regexObject;
  const char *regexCharacters;
  int options;
} regexProbeObject;

The following probes fire at the start and end of a match by a regular expression.

BeginMatch(regexProbeObject *probeObject, NSUInteger hash, NSRange *ranges,
           NSUInteger rangeCount, void *charactersBuffer, NSUInteger length,
           NSRange *searchRange, int matchOptions);

EndMatch(  regexProbeObject *probeObject, NSUInteger hash, NSRange *ranges,
           NSUInteger rangeCount, void *charactersBuffer, NSUInteger length,
           NSRange *searchRange, int matchOptions,
           int errorCode, char *errorCodeCharacters);

Regular Expression Match Arguments Description
Name	Argument	Description
probeObject	arg0	A pointer to a regexProbeObject type.
hash	arg1	The computed hash value for the RKRegex object.
ranges	arg2	A pointer to an array of NSRange that will contain the results of the match.
rangeCount	arg3	The number of valid ranges, usually equal to the number of captures for the regular expression.
charactersBuffer	arg4	A pointer to the buffer containing the bytes to perform the match on.
length	arg5	The length, in bytes, of charactersBuffer.
searchRange	arg6	A pointer to a NSRange containing the range within charactersBuffer to perform the match on.
matchOptions	arg7	The RKMatchOption match options.
errorCode	arg8	If the match was successful, errorCode contains the number of ranges that contain valid results. Otherwise, errorCode contains the RKMatchErrorCode error code.
errorCodeCharacters	arg9	A pointer to a NULL terminated string of the text of the RKMatchErrorCode error code.

The following probe fires if a regular expression match generates an exception.

MatchException(regexProbeObject *probeObject, NSUInteger hash, NSRange *ranges,
               NSUInteger rangeCount, void *charactersBuffer, NSUInteger length,
               NSRange *searchRange, int matchOptions,
               char *exceptionNameCharacters, char *reasonCharacters);

Regular Expression Match Exception Arguments Description
Name	Argument	Description
probeObject	arg0	A pointer to a regexProbeObject type.
hash	arg1	The computed hash value for the RKRegex object.
ranges	arg2	A pointer to an array of NSRange that will contain the results of the match.
rangeCount	arg3	The number of valid ranges, usually equal to the number of captures for the regular expression.
charactersBuffer	arg4	A pointer to the buffer containing the bytes to perform the match on.
length	arg5	The length, in bytes, of charactersBuffer.
searchRange	arg6	A pointer to a NSRange containing the range within charactersBuffer to perform the match on.
matchOptions	arg7	The RKMatchOption match options.
exceptionNameCharacters	arg8	A pointer to a NULL terminated string of the name of the exception.
reasonCharacters	arg9	A pointer to a NULL terminated string for the reason of the exception.

Cache Related Probes

The following probes fire at the start and end of a cache lookup.

BeginCacheLookup(void *cache, char *description, NSUInteger lookupObjectHash,
                 char *lookupObjectDescription, int shouldAutorelease,
                 int isCacheEnabled, NSUInteger cacheHits, NSUInteger cacheMisses);

EndCacheLookup(  void *cache, char *description, NSUInteger lookupObjectHash,
                 char *lookupObjectDescription, int shouldAutorelease,
                 int isCacheEnabled, NSUInteger cacheHits, NSUInteger cacheMisses,
                 NSUInteger currentCount, void *cachedObject);

Cache Look Up Arguments Description
Name	Argument	Description
cache	arg0	A pointer to the RKCache object.
description	arg1	A pointer to a NULL terminated string containing a description of the RKCache object.
lookupObjectHash	arg2	The hash value of the object to look up in the cache.
lookupObjectDescription	arg3	A pointer to a NULL terminated string containing a description of the object to look up, if available.
shouldAutorelease	arg4	If the requested object is in the cache, it is always sent a retain message while the cache is locked. If shouldAutorelease is set to 1, the cache will send a autorelease message to the retrieved object as well.
isCacheEnabled	arg5	Set to 1 if the cache is currently enabled, 0 otherwise.
cacheHits	arg6	The number of times an object was found in the cache since the last reset of the counters.
cacheMisses	arg7	The number of times the cache was unable to fulfill a lookup request with a cached result since the last reset of the counters.
currentCount	arg8	The number of objects currently in the cache.
cachedObject	arg9	Set to NULL if the cache was unable to fulfill the look up request with a result from the cache, other cachedObject contains a pointer to the object found in the cache.

The following probes fire at the start and end of adding an object to a cache.

BeginCacheAdd(void *cache, char *description, void *addObject, NSUInteger addObjectHash,
              char *addObjectDescription, int isCacheEnabled);

EndCacheAdd(  void *cache, char *description, void *addObject, NSUInteger addObjectHash,
              char *addObjectDescription, int isCacheEnabled, NSUInteger currentCount,
              int didCache);

Add Object To Cache Arguments Description
Name	Argument	Description
cache	arg0	A pointer to the RKCache object.
description	arg1	A pointer to a NULL terminated string containing a description of the RKCache object.
addObject	arg2	A pointer to the object to add to the cache.
addObjectHash	arg3	The hash value of the object to add to the cache.
addObjectDescription	arg4	A pointer to a NULL terminated string containing a description of the object to add, if available.
isCacheEnabled	arg5	Set to 1 if the cache is currently enabled, 0 otherwise.
currentCount	arg6	The number of objects currently in the cache.
didCache	arg7	Set to 1 if the object did not already exist in the cache and was successfully added, 0 otherwise.

The following probes fire at the start and end of removing an object from a cache.

BeginCacheRemove(void *cache, char *description, NSUInteger removeObjectHash,
                 int isCacheEnabled);

EndCacheRemove(  void *cache, char *description, NSUInteger removeObjectHash,
                 int isCacheEnabled, NSUInteger currentCount,
                 void *removedObject, char *removedObjectDescription);

Cache Object Removal Arguments Description
Name	Argument	Description
cache	arg0	A pointer to the RKCache object.
description	arg1	A pointer to a NULL terminated string containing a description of the RKCache object.
removeObjectHash	arg2	The hash value of the object to remove from the cache.
isCacheEnabled	arg3	Set to 1 if the cache is currently enabled, 0 otherwise.
currentCount	arg4	The number of objects currently in the cache.
removedObject	arg5	If the object was unable to be removed, or did not exist in the cache, removedObject contains a pointer to the removed object. Otherwise removedObject contains NULL.
removedObjectDescription	arg6	A pointer to a NULL terminated string containing a description of the object to add, if available.

The following probe fires whenever the statistic counters for a cache are cleared. The probe provides the cache hit and miss counts just prior to clearing the counters.

CacheCleared(void *cache, char *description, int didClearCache, NSUInteger cacheClearedCount,
             NSUInteger preClearCacheHits, NSUInteger preClearCacheMissed);

Cache Clearing Arguments Description
Name	Argument	Description
cache	arg0	A pointer to the RKCache object.
description	arg1	A pointer to a NULL terminated string containing a description of the RKCache object.
didClearCache	arg2	Set to 1 if the cache was successfully cleared, 0 otherwise.
cacheClearedCount	arg3	The number of times the cache was cleared.
preClearCacheHits	arg4	The number of times an object was found in the cache before the counter was cleared.
preClearCacheMissed	arg5	T number of times the cache was unable to fulfill a lookup request with a cached result before the counter was cleared.

Multithreaded Locks Related Probes

The following probes fire at the start and end of an attempt to acquire a multithreaded lock. The Unlock probe fires when a thread relinquishes a lock.

BeginLock(void *lock, int forWriting, int isMultithreaded);

EndLock(  void *lock, int forWriting, int isMultithreaded, int acquiredLock, NSUInteger spinCount); 

Unlock(   void *lock, int forWriting, int isMultithreaded);

Multithreaded Locks Arguments Description
Argument	Argument	Description
lock	arg0	A pointer to the RKLock object.
forWriting	arg1	Set to 1 if the request is a write lock, 0 otherwise.
isMultithreaded	arg2	Set to 1 if the lock has switched to full multithreading mode, otherwise set to 0 if the lock still in single threaded performance mode.
acquiredLock	arg3	Set to 1 if the the lock was successfully acquired, 0 otherwise.
spinCount	arg4	The number of times that an attempt was made to acquire the lock, but the lock was busy and unavailable.

Accessing Probes From the Shell

Important:

Most uses of the dtrace command require superuser privileges. The following examples use sudo to execute dtrace as the root user.

The following is an example of a dtrace script file. It uses the aggregation variable type to record the number of times a select number of RegexKit probes fire. Then, once per second, the script outputs the current counts and resets the aggregation variables to zero and begins counting again.

An example dtrace script file, perSecond.d :

#pragma D option switchrate=10msec
#pragma D option bufsize=15m
#pragma D option quiet

BEGIN { perTimeUnitStart = walltimestamp; }

RegexKit*:::EndMatch        { @matchCount   = count(); }
RegexKit*:::EndRegexCompile { @compileCount = count(); }
RegexKit*:::EndCacheLookup  { @lookupCount  = count(); }

tick-1sec
{
  normalize(@matchCount,   (walltimestamp - perTimeUnitStart) / 1000000000);
  normalize(@compileCount, (walltimestamp - perTimeUnitStart) / 1000000000);
  normalize(@lookupCount,  (walltimestamp - perTimeUnitStart) / 1000000000);
  printa("Matches %@8d/sec, Compiles %@8d/sec, Cache Lookups %@8d/sec\n", @matchCount, @compileCount, @lookupCount);
  perTimeUnitStart = walltimestamp;
  clear(@matchCount);
  clear(@compileCount);
  clear(@lookupCount);
}

The example script above can be copy and pasted in to a shell, and then executed with the following command:

shell% sudo dtrace -Z -s perSecond.d
Matches     1216/sec, Compiles      156/sec, Cache Lookups      456/sec
Matches     1883/sec, Compiles      345/sec, Cache Lookups     2917/sec
Matches    25148/sec, Compiles     3453/sec, Cache Lookups    35248/sec
Matches    26968/sec, Compiles     3697/sec, Cache Lookups    34116/sec
Matches    28312/sec, Compiles     3586/sec, Cache Lookups    40055/sec
Matches    25688/sec, Compiles     4196/sec, Cache Lookups    34066/sec
Matches    28298/sec, Compiles     3178/sec, Cache Lookups    38510/sec
Matches    26780/sec, Compiles     4902/sec, Cache Lookups    33363/sec
Matches    29088/sec, Compiles     3560/sec, Cache Lookups    38429/sec
^C

shell%

Tip:

You can syntax check a script, without sudo, using dtrace -e -s SCRIPT

It is important to note that in the above example output, the executable to be traced was already executing before the dtrace command was executed. This illustrates the dynamic part of dtrace. The probe points are always active in the RegexKit framework, and they can be enabled on the fly at any time.

You can also specify probes to match as an argument to the dtrace command. The following two examples demonstrate the aggregation functionality of dtrace by counting the number of samples within an aggregation bin.

Note:

The following statistics were gathered on a 1.5GHz G4 PowerBook while the RegexKit Unit Tests were executing.

The first example measures the amount of time, in microseconds, it takes to perform a lookup in the cache. The left hand side is the number of microseconds, from zero to 20, an the right hand side is the number of samples counted for each sample bin. The majority of samples are in the 4 to 6 microsecond range.

shell% sudo dtrace -Z -q -n 'RegexKit*:::BeginCacheLookup { self->lookupStartTime = vtimestamp; }' -n 'RegexKit*:::EndCacheLookup { @lookups = lquantize(((vtimestamp - self->lookupStartTime) / 1000), 0, 20, 1); }'
after a few seconds...
^C


           value  ------------- Distribution ------------- count    
               3 |                                         0        
               4 |@@@                                      133696   
               5 |@@@@@@@@@@@@@@@@@@@@                     768620   
               6 |@@@@@@@@@@@@@@@                          595496   
               7 |                                         16868    
               8 |                                         8733     
               9 |                                         4119     
              10 |                                         3112     
              11 |                                         3071     
              12 |                                         1690     
              13 |                                         1831     
              14 |                                         1050     
              15 |                                         917      
              16 |                                         811      
              17 |                                         670      
              18 |                                         714      
              19 |                                         869      
           >= 20 |                                         10541    

shell% 

Next, the amount of time to compile a regular expression is graphed with the same units of measurements (zero to 20 microseconds).

shell% sudo dtrace -Z -q -n 'RegexKit*:::BeginRegexCompile { self->compileStartTime = vtimestamp; }' -n 'RegexKit*:::EndRegexCompile { @compiles = lquantize(((vtimestamp - self->compileStartTime) / 1000), 0, 20, 1); }'
after a few seconds...
^C


           value  ------------- Distribution ------------- count    
               4 |                                         0        
               5 |                                         223      
               6 |@                                        1006     
               7 |@                                        1062     
               8 |@                                        1276     
               9 |@                                        1107     
              10 |                                         477      
              11 |                                         582      
              12 |@                                        977      
              13 |                                         584      
              14 |@@@@@@@@@@@@@@@@@@@@@@@@@@               37308    
              15 |@@@                                      5033     
              16 |@@                                       2372     
              17 |@                                        897      
              18 |                                         577      
              19 |                                         517      
           >= 20 |@@@                                      3884     

shell% 

It's easy to see that the majority of samples for compiling a regular expression are in the 14 microsecond range, and the distribution is much more spread out compared to the cache lookup distributions. This also clearly demonstrates the usefulness of the RegexKit cache which is nearly three times faster.

Solaris Dynamic Tracing Guide (as .PDF)

Accessing Probes With Instruments.app

Warning:

Several bugs in Instrument.app were encountered during the creation of the RegexKit instruments. Using instruments modified from within Instrument.app may not create the expected results. Since every aspect that is used to generate the results, from RegexKit to DTrace to Instrument.app, is only very recently released results should not be considered authoritative or necessarily even correct.

RegexKit includes a number of instruments tailored to RegexKit for Instruments.app. These are installed in /Developer/Library/Instruments/PlugIns automatically if Instruments.app is installed.

Important:

The file format for creating Instrument.app instruments is currently not documented, therefore the following should be considered experimental. The following were tested on Instrument.app version 1.0 (72).

Description of RegexKit Instruments for Instruments.app
Instrument	Description
Cache Lookup Timing	Records the time it takes to retrieve a regular expression from the cache in microseconds.
Collection Cache	Records the effectiveness of the Least Recently Used negative-hit sorted regex collection cache.
Collection Timing	Records the time it takes to determine if a regular expression in a collection matches a target string in microseconds.
Compile Errors	Records regular expressions that failed to compile due to an error.
Compile Timing	Records the time it takes to compile a regular expression in microseconds.
Lock Timing	Records timing information for multithreaded locks in microseconds.
Match Errors	Records matches that result in an error.
Match Timing	Records the time it takes to perform a match in microseconds.
Per Second	Records per second statistics.
Performance Notes	Records potential performance problems that the framework has detected.

The use of Instrument.app will not be covered here. None the less, using the provided instruments should be straight forward. Instrument.app also allows for the creation of your own DTrace scripts so you can create or modify scripts to extract the information that you require.

Instruments User Guide

Adding the RegexKit.framework to your Project

Important:

The prebuilt framework included with the distribution, /Developer/Local/Frameworks/RegexKit.framework, may only be used as an embedded private framework. It can only be installed inside your applications bundle, ie My App.App/Contents/Frameworks/RegexKit.framework. It should not be installed in /Library/Frameworks or ~/Library/Frameworks.

Adding the framework to your project is fairly straight forward. These directions cover adding the framework to your project as an embedded private framework. An embedded private framework is just like a standard framework, such as Cocoa, except that unlike Cocoa, a copy of the embedded private framework is included inside your applications .App bundle in the My App.App/Contents/Frameworks directory.

Your applications executable file, which is in the My App.app/Contents/MacOS directory, is then dynamically linked to the embedded private framework. The linker records that the path to the embedded private framework, and therefore the shared library that contains the code for the framework, exists within the applications bundle. Then, when your application is executed, the dynamic linker knows to find the frameworks shared library in the applications bundle and not the standard framework search paths, such as /System/Library/Frameworks or /Library/Frameworks.

Outline of Required Steps

The following outlines the steps required to use the framework in your project.

Linking the framework to your executable.
Adding a Copy Files build phase to your executable target.
Import the RegexKit/RegexKit.h header.

Important:

These instructions apply to Xcode versions 2.4.1 and 3.0. Other versions should be similar, but may vary for specific details.

Important:

These instructions assume you have installed the framework in the default location /Developer/Local/Frameworks/RegexKit.framework.

Linking to the Framework

Using the framework requires that you link your application to it and copy it in to your applications bundle. Figure 1 shows a typical new application in Xcode.

Figure 1The start of a new Xcode application

You link to the framework as follows:

Add the framework to the resources that Xcode is aware for your application by expanding the Frameworks group. Then, right-click on Linked Frameworks and choose Add > Existing Frameworks... as shown in Figure 2.

Figure 2Adding an existing framework
Choose /Developer/Local/Frameworks/RegexKit.framework. Xcode will then ask which targets to add the framework to. Select your application if it is not already selected. When you have selected all the targets you would like to add the framework to, click the Add button. The RegexKit.framework should now appear within the Linked Frameworks group. Additionally, the framework should automatically appear under the Link Binary With Libraries build phase for your application as shown in Figure 3.

Figure 3The application linked to the framework

Copying the Framework to your Applications Bundle

Next, you will need to add a Copy Files build phase to your applications target.

Within the Targets group, right-click on your application and choose Add > New Build Phase > New Copy Files Build Phase as show in Figure 4.

Figure 4Adding a Copy Files build phase to the applications target
A window titled Copy Files Phase for "Your Application" Info will appear. Choose Frameworks from the Destination pop-up menu leaving the Path field empty and the Copy only when installing checkbox deselected. The window should now look like Figure 5. When finished, close the window.

Figure 5Choosing the destination for the Copy Files build phase
Finally, add the RegexKit.framework to the files to be copied. Choose the RegexKit.framework from Frameworks > Linked Frameworks and drag it to the newly created Copy Files build phase as shown in Figure 6.

Figure 6Adding the framework to the files to be copied

Important:
The order in which the Copy Files phase takes place is not critical as the copied framework is only required when the application is run, not during the build. Xcode uses the framework files that are to be copied to complete the actual build operation. You may leave the Copy Files phase after the Link Binary With Libraries phase, or drag the Copy Files phase to the position after the Copy Bundle Resources phase.

Importing the RegexKit.h Header

For each of your fileName.m files that makes use of RegexKit.framework functionality, you will need to add a statement to include the RegexKit.h header. This is normally accomplished by adding the statement #import <RegexKit/RegexKit.h> to fileName.h. For example:

// // myController.h // My New App // // Created by You on 1/1/08. // Copyright 2008 __MyCompanyName__. All rights reserved. // #import <Cocoa/Cocoa.h> #import <RegexKit/RegexKit.h>
Optionally, although recommended, you can add the RegexKit.h header to the list of headers that Xcode precompiles. This can reduce compile times because the header is processed only once ahead of time, instead of each time that it is imported. By default, Xcode creates a file called Application_Prefix.pch that is within the Other Sources group. To include the RegexKit.h header in the header files that Xcode precompiles, you need to add a #import <RegexKit/RegexKit.h> statement to Application_Prefix.pch. A typical file would look something like:

// // Prefix header for all source files of the 'My New App' target in the 'My New App' project // #ifdef __OBJC__ #import <Cocoa/Cocoa.h> #import <RegexKit/RegexKit.h> #endif
Clean any targets that you have made changes to. The easiest way to do this is to clean all the targets by choosing Build > Clean All Targets from the menu bar and then selecting the Also Clean Dependencies and Also Remove Precompiled Headers checkboxes in the dialog that appears.
Rebuild the Code Sense Index. In order to make sure that Xcodes Code Sense feature includes the definitions from RegexKit.framework, it's a good idea to rebuild the Code Sense Index. From the menu bar, choose Project > Edit Project Settings and click on the General tab in the window that appears. Then, within the General pane, click on the Rebuild Code Sense Index button that is near the bottom.

Finished

Your application is now set up to use the framework. When you compile your application, Xcode will copy all the files necessary to use the RegexKit.framework in to your applications bundle.

License Information

The code for this framework is licensed under what is commonly known as the revised, 3-clause BSD-Style license.

License

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the Zang Industries nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.