This document introduces the RegexKit.framework for the Objective-C language and demonstrates how to use regular expressions in your project. The RegexKit.framework enables easy access to regular expressions by providing a number of additions to standard Foundation classes, such as NSArray, NSDictionary, NSSet, and NSString, along with their mutable variants. The RegexKit.framework acts as a bridge between the Foundation classes and the PCRE (Perl Compatible Regular Expression) library, available at www.pcre.org.
Regular expressions can be quite complex. This section is in no way meant to be a comprehensive overview of regular expressions, it is only a pragmatic introduction to regular expressions highlighting some of the features than can be put to immediate use. Since this framework uses the PCRE library to perform the actual regular expression matching, you should familiarize yourself with the specifics of the PCRE Regular Expression Syntax.
Pattern | Description |
---|---|
. | Match any character except New Line |
\ | Escape the next character |
^ | Match the beginning of a line |
$ | Match the end of a line |
| | Alternative |
( ) | Capture subpattern grouping |
[ ] | Character class |
Pattern | Description |
---|---|
\d | Any decimal digit |
\D | Any character that is not a decimal digit |
\s | Any whitespace character |
\S | Any character that is not a whitespace character |
\w | Any word character |
\W | Any non-word character |
Pattern | Description |
---|---|
* | Match 0 or more times |
+ | Match 1 or more times |
? | Match 1 or 0 times |
{n} | Match exactly n times |
{n,} | Match at least n times |
{n,m} | Match at least n but not more than m times |
One of the basic primitives of regular expressions is the pair What to match and How many times to match, or Quantity. Some what to match sequences are so common, such as the whitespace characters, or any alphanumeric character except the whitespace characters, that they have special short hand notation in regular expressions. See the table Generic character types for some of the more common shorthands.
As an example, suppose you wanted to match a number of the form 123.45 A very simple pattern that would match this \d+\.\d+ Note that the decimal point is escaped in the regular expression with a \ as we would like to match the character . and not its normal regular expression meaning which is match any character. Without the escape, the regular expression would match 123z45, which is clearly not what we want.
Often we're not interested in whether or not a regular expression matches necessarily, but we are interested in certain parts of what matched. From the previous example, suppose we were interested in the number before the decimal point, and the number after it. To do that, we use another regular expression feature called subpatterns.
Subpatterns are specified in a regular expression with a pair of parenthesis, ( ) and have the following syntax:
To update our previous example, the regular expression becomes (\d+)\.(\d+) The regular expression engine then provides the range of matching characters that correspond to a subpattern, which is called a capture.
Captures are numbered sequentially, beginning at zero, and then in the order of appearance in the regular expression. Capture 0 (zero) has a special meaning which is the entire range of characters that the regular expression matched. In our example, capture 1 (one) corresponds to the number before decimal point, and capture 2 (two) corresponds to the number after the decimal point.
Complex regular expressions might contain a number of capture subpatterns, and keeping track of the correct capture subpattern can be error prone. To make things easier, PCRE provides syntax to name a subpattern, which is:
Capture subpatterns can be nested to an arbitrary depth as well:
In the example given, the capture name dollars would include the digits up to the decimal point, cents would include the digits after the decimal point, and total would include the decimal point along with the digits before and after the decimal point.
Past this point things start to get more complicated quickly. Using features that are obviously not covered here, it is possible to craft a single regular expression that is capable of matching a wide range of "numbers", from the most basic representation of just numeric digits, to optionally accepting digits after the decimal point, but always requiring at least one digit before the decimal point (ie, .234 would not be valid), with optional scientific exponent tacked on the end. For details on these more advanced features you will need to read the PCRE Regular Expression Syntax documentation.
If you're new to regular expressions, hopefully the three basic points (what, how much, which part) covered here are enough to be useful to you. Regular expressions make most text processing tasks much easier. A common need is to strip any leading or trailing whitespace in a string and a regular expression like \s*(.*\S+)\s* will do just that. Nearly any task that requires a NSScanner can be done faster and with less code with a regular expression.
However, regular expressions are not the solution to every pattern matching problem. Regular expressions tend to be very difficult for other programmers to read and understand, let alone maintain, and are often described as "write only." Even after careful review, their can be spectacularly large, but hidden, differences between what a regular expression actually matches and what its author intended. This can have enormous repercussions in some usage scenarios, such as depending on a regular expression for validation in a security context. As with all tools, make sure it's the right one for the problem you're trying to solve.
The RKRegex class forms the core of the RegexKit.framework. It provides basic primitives for performing regular expression matches on raw byte buffers and obtaining the results of those matches in the form of NSRange structures. It also provides methods for translating the name of a capture subpattern to its equivalent capture index number.
While you can create RKRegex objects directly, all of the extensions to the Foundation classes accept either an instantiated RKRegex object or a NSString with a regular expression pattern which will automatically be converted to a RKRegex object for you. Usually you will only need to manually instantiate RKRegex objects when you require an unusual option to be set that can't be altered from within the regular expression pattern itself.
There are two methods used in the creation of RKRegex objects:
Method | Description |
---|---|
initWithRegexString:options: | Designated initializer. Primary means of creating RKRegex objects. |
regexWithRegexString:options: | Convenience method that allocates, initializes, and returns an autoreleased RKRegex object. |
For each regular expression match, a C style array of NSRange results is created. The index value of the NSRange array corresponds to the matching regular expression subcapture index value. Index 0 (zero) is always created and represents the entire range that the regular expression matched. Subsequent index values, up to the total capture indexes in the regular expression, represent the match range for the equivalent subcapture.
Method | Result | Description |
---|---|---|
getRanges: withCharacters: length: inRange: options: | RKMatchErrorCode | Copies an array of NSRange structures to a user supplied buffer that is at least [aRegex captureCount] big. |
matchesCharacters: length: inRange: options: | BOOL | A boolean YES or NO depending on whether or not the regular expression matched the string. |
rangeForCharacters: length: inRange: captureIndex: options: | NSRange | The NSRange for the captureIndex: subcapture |
rangesForCharacters: length: inRange: options: | NSRange * | An autoreleased block of memory containing a C array of NSRange structures, one for each capture. Accessed via NSRange[0] through NSRange[Max Subcapture]. |
Since these methods work on raw byte buffers only, and only return the range(s) of a match, they generally aren't used by end user programs directly. The RegexKit.framework provides a number of extensions to common Foundation classes, such as NSString, that are much easier to use than the raw results provided by the RKRegex class.
The RKCache class provides the frameworks caching functionality. It provides a multithreading safe means of caching and retrieving immutable objects, notably RKRegex objects. The RKRegex class makes heavy use of the RKCache class to improve the performance of the framework.
The PCRE library requires that the text form of a regular expression be parsed and compiled to an internal form that is usable by the matching routines. While this is a relatively quick operation, it is not instantaneous. The RegexKit.framework takes advantage of the fact that once a regular expression has been compiled, the compiled form is immutable and can be reused again and again.
The RegexKit.framework maintains a global RKCache of instantiated RKRegex objects. When a RKRegex is allocated and sent a initWithRegexString:option: message, it first checks the global cache. If the cache contains a match, the cached result is returned instead. Otherwise a new RKRegex is created and added to the global cache. This allows the RegexKit.framework to convert a regular expression pattern in NSString form in to a RKRegex very quickly.
Caching happens automatically, and is fully multithreading safe. Information about the cache is available by invoking the status method. For example:
A common usage scenario is to apply the same regular expression to every line in a text file. The cache removes the need to anticipate when it makes sense to create a RKRegex object once and reuse it, or just make use of the convenience methods which would normally recreate the same regular expression for each invocation. This also results in less clutter in your code.
The following example iterates over all the items in stringArray and strips any leading or trailing whitespace the original string had and skips strings that are empty or contain only whitespace characters. Since the cache eliminates the need to explicitly manage RKRegex objects, the following programming style can be used without a performance penalty:
In many cases creating a regular expression object ahead of time to use repeatedly is simply not feasible. Typically the details required to create a long lived regular expression object to use in repeated matchings are not available to the caller due to object abstraction. The following implements the NSScanner matching example from String Programming Guide for Cocoa. It is significantly more compact than the 27 lines used to implement the same functionality with NSScanner and is dramatically faster as well. Once the initial regular expression object has been cached there is virtually no overhead except for the actual matching itself. The underlying methods used by isMatchedByRegex: do not create any additional objects or memory allocations. Only stack space is required to determine if there was a successful match or not. No special steps are required by the caller of scanProductString: or by its implementor to achieve the best performance due to automatic caching.
The RKEnumerator class provides the means to enumerate all the matches of a regular expression in a NSString.
In addition to basic enumeration, the RKEnumerator class provides a number of methods to extract, convert, and format the currently enumerated match that are similar to the various NSString additions. These include methods such as getCapturesWithReferences:, stringWithReferenceString:, and stringWithReferenceFormat:.
In addition to the above, the RegexKit.framework makes a number of Objective-C category additions to the NSArray, NSDictionary, NSSet, and NSString Foundation classes. These additions are the primary means of using the RegexKit.framework.
Since regular expressions are often involved in text manipulation tasks, the NSString RegexKit Additions are covered in their own section, NSString Additions.
The remaining RegexKit additions are to the Foundation Collection class objects, NSArray, NSDictionary, and NSSet. Most of the additions are essentially the same and involve either querying a collection to determine if it contains an object that is matched by a regular expression, obtaining the count of objects matched by a regular expression, or creating a new collection from the objects matched by a regular expression.
The NSDictionary RegexKit Additions allow you to query on either keys matched by a regular expression, or objects matched by a regular expression. In both cases, a NSArray of either the matching keys or objects can be returned, or a new NSDictionary that contains the results from the regular expression match.
Along with the immutable collection classes, the mutable collection classes receive a number of additions as well. These are mostly convenience methods, allowing you to remove items from a collection based on whether or not a item is matched by a regular expression, or allowing you to add items matched by a regular expression from a second collection.
Extracting the text of a capture subpattern is done by sending getCapturesWithRegexAndReferences: to a member of the NSString class. The first argument is the regular expression that you would like to match, followed by a nil terminated, variable length list of key and pointer to a pointer arguments.
The complete example:
After executing, extractedString will contain a pointer to a newly created, autoreleased NSString containing the requested matching text. In the previous example, extractedString will point to a string that is equivalent to @"1234.56";
The following example demonstrates extracting multiple strings at once with both numbered and named capture references. Note that capture number zero refers to entire text that the regular expression matched.
The RegexKit.framework NSString class additions also provide the means to automatically convert the captured text in to a number of formats. This includes primitive C types and Objective-C types such as NSNumber and NSDate. Primitive type conversion is handled by the systems scanf function and therefore makes use of the same conversion syntax and specifiers that scanf does. If you are unfamiliar with the syntax, it is generally the same percent-style used for converting primitive types in to string form. For example, "%d" would convert an ASCII string form of "4259" in to an int with the value of 4259.
In the following example, the text of the hex color value is matched and returned as a NSString.
While useful, additional operations on the value represented by the text are much easier if converted in to a primitive type, such as an unsigned int. This is accomplished by specifying the conversion type desired with the capture reference. To avoid ambiguity the capture reference must contain a pair of curly braces ('{' and '}'), which contain both the capture subpattern reference and the conversion specification separated by a colon (':') character. For example, ${1:%x} refers to capture subpattern number one and specifies %x, or hexadecimal to unsigned int, for the conversion.
As an example, the following matches the text for the hex color in string form, converts it to an unsigned int, and stores the result in hexColor.
This same simple conversion, without the help of regular expressions and automatic type conversion, typically spans multiple lines. First, one has to scan the subject string and find the range of interest. Once found, that text is usually copied to a temporary buffer. Finally, the appropriate string to value conversion function is called. With the ability to perform type conversions as part of the matching process, getCapturesWithRegexAndReferences: makes quick work of what was once a tedious and error prone process.
In addition to converting matched text to basic C data types, you can also convert matched text to NSNumber and NSCalendarDate objects. The following example demonstrates the conversion to an NSNumber using the NSNumberFormatterSpellOutStyle number format style.
The @d type conversion can parse a wide range of date formats, returning a NSCalendarDate object:
The method isMatchedByRegex: can be used to check if a string is matched, or not matched, by a regular expression.
Or, as an example of a regular expression not matching a string:
The method isMatchedByRegex:inRange: can be used to alter the range of the string to check for a match as well:
Even though the range of the string specified, Only the fir, contains the first few characters of the regular expression to match, the result is NO since the entire regular expression, first, is not matched.
To find the the entire range that a regular expression matches in a string, you can use the rangeOfRegex: method. For example:
Or, if the string is not matched by the regular expression, the range {NSNotFound, 0} is returned.
As the first example demonstrates, only the result of the first match of a regular expression is returned. Additional results would require invoking rangeOfRegex:inRange:capture: with a capture of 0 with a range that begins at the end of the last match. For example:
rangeOfRegex:inRange:capture: can also be used to find the range of a capture subpattern as well:
To obtain ranges for all the capture subpatterns of a regular expression match, you can use the method rangesOfRegex: and it's companion rangesOfRegex:inRange:. These methods will return a pointer to an array NSRange structures. The pointer returned is to a block of autoreleased memory that is sizeof(NSRange) * [regex captureCount] bytes big. The memory containing the range results will be released, and therefore invalid, once the current NSAutoreleasePool is released. You should not keep a pointer to the returned buffer past this point, and you should copy any results that you require past the current NSAutoreleasePool context. As an example:
RegexKit provides a number of methods that allow you to easily create new strings that include the text from a regular expression match, similar to the way that perl allows you to access capture subpatterns with the variables $number.
The method stringByMatching:withReferenceString: allows you to create a new, temporary string that replaces any capture references with the text of the matched text. For example:
Strings can also be created with a combination of match results and format specification argument replacement:
The methods stringByMatching:inRange:withReferenceString: and stringByMatching:inRange:withReferenceFormat: are also available which allow you to work on sub-ranges of strings.
In addition to creating new strings from the results of a match, the NSString RegexKit additions also provide the means to replace the matched range with the text of a new string. The replacement string may include references to text matched by the regular expression as well. The search and replace methods allow you to specify the number of times that the regular expression can match and replace the receivers text. A special constant, RKReplaceAll, is used to specify that all the matches in the receiver should be replaced. For example:
An example demonstrating multiple replacements:
The same example, but replacing only the first two matches:
An example of a more restrictive regular expression:
With the RKEnumerator class, you can enumerate all of the matches of a regular expression in a string the same way you might enumerate all the objects in a NSArray with a NSEnumerator. Unlike the NSEnumerator class, however, the RKEnumerator class provides a number of additional methods for accessing the details of the currently enumerated match. Many of the additional methods have analogs to the NSString RegexKit additions, such as stringWithReferenceFormat:, which allows you to create a new, temporary string with references to the currently enumerated match.
The RKEnumerator class provides a number of next... methods to advance to the next match. Which one to use depends on what you will use the match results for. The method nextRanges is the fastest and has the least internal overhead since it only updates it's private buffer with the information of the next match, if any. nextObject is the slowest, as it creates a NSArray of NSValue objects containing the ranges of all the capture subpatterns.
Here are some examples demonstrating the use of RKEnumerator:
The same example, but converting the current match to a double:
An example using the stringWithReferenceFormat: method:
Mac OS X 10.5 Leopard contains a powerful new debugging facility called DTrace. DTrace, originally developed by Sun Microsystems for the Solaris Operating System, is an open source kernel level framework for dynamically instrumenting live systems. While there have been numerous tools in the past to record program execution trace information, such as truss or ktrace, none have been as comprehensive as DTrace. For example, DTrace allows you to trace the entry and exit of any function, in any program, and record the arguments on entry and the results on exit. Mac OS X 10.5 provides an Objective-C DTrace provider that extends that functionality to Objective-C class and instance methods as well. Of course, DTrace also provides extensive kernel level tracing as well, including user to kernel crossings such as syscall.
DTrace also allows applications to define custom probe points that DTrace can attach to. This allows, for example, shared library developers to offer tailored probe points for specific information, instead of trying to recreate the information by tracking individual calls in to the library. RegexKit makes use of this functionality to provide a number of RegexKit specific DTrace probe points.
The following simply documents the enhanced DTrace functionality that RegexKit provides. It is not a guide to the DTrace facility. A pre-requisite to making the most of DTrace is the Solaris Dynamic Tracing Guide (as .PDF). This is a must read for anyone who wishes to make the most of DTrace and the information outlined in the following sections.
The following table provides a list of the available RegexKit probe points. Following the table, the details of each probe are provided, including the number of arguments, argument types, and description of each argument.
Probe Point | Description |
---|---|
RegexKit:::PerformanceNote | Fires for potential performance problems. |
RegexKit:::BeginRegexCompile | Fires at the start of compiling a regular expression. |
RegexKit:::EndRegexCompile | Fires at the end of compiling a regular expression. |
RegexKit:::MatchException | Fires when matching results in an exception. |
RegexKit:::BeginMatch | Fires at the start of a match. |
RegexKit:::EndMatch | Fires at the end of a match. |
RegexKit:::CacheCleared | Fires when the regular expression cache is cleared. |
RegexKit:::BeginCacheLookup | Fires at the start of a cache lookup. |
RegexKit:::EndCacheLookup | Fires when a cache lookup completes. |
RegexKit:::BeginCacheAdd | Fires at the start of adding an object to the cache. |
RegexKit:::EndCacheAdd | Fires at the end of adding an object to the cache. |
RegexKit:::BeginCacheRemove | Fires at the start of removing an object from the cache. |
RegexKit:::EndCacheRemove | Fires at the end of removing an object from the cache. |
RegexKit:::BeginLock | Fires at the start of an attempt to acquire a lock. |
RegexKit:::EndLock | Fires once a lock has been acquired. |
RegexKit:::Unlock | Fires when a previously acquired lock is released and unlocked. |
The following probe fires when the framework has detected a potential performance impacting condition.
Name | Argument | Description |
---|---|---|
object | arg0 | If applicable, contains a pointer to the relevant object. |
hash | arg1 | The hash value for object. |
description | arg2 | The description for object. |
size | arg3 | If applicable, contains the size value related to the performance note. |
impact | arg4 | Conditions which negatively impact performance will have a value < 0. Conditions which positively impact performance will have a value > 0. |
noteType | arg5 | For a general performance notes, noteType is 0. For performance notes that can be timed, noteType is 1 to indicate the start, and 2 to indicate the end. |
note | arg6 | A pointer to a NULL terminated string containing a description of the performance condition. |
The following probes fire at the start and end of the compiling of a regular expression in to an internal format.
Name | Argument | Description |
---|---|---|
regex | arg0 | A pointer to the RKRegex object. |
hash | arg1 | The computed hash value for the RKRegex object. |
regexCharacters | arg2 | A pointer to a NULL terminated string containing the text of the regular expression. |
compileOption | arg3 | The RKCompileOption options for the regular expression. |
errorCode | arg4 | Contains the RKCompileErrorCode error code. |
errorCodeCharacters | arg5 | A pointer to a NULL terminated string of the text of the RKCompileErrorCode error code. |
pcreErrorCharacters | arg6 | If there was an error compiling the regular expression, this is set to a pointer to a NULL terminated string from the pcre library describing the error. |
errorAtOffsetOfRegexCharacters | arg7 | If there was an error compiling the regular expression, this contains the location of the first character in regexCharacters that caused the error. |
The regexProbeObject type is used in the following match methods as there are only ten arguments, arg0 through arg9, available to probes. Although the use of regexProbeObject results in some inconvenience, it does allow for more information to be passed by the probe than would otherwise normally be available.
The following probes fire at the start and end of a match by a regular expression.
Name | Argument | Description |
---|---|---|
probeObject | arg0 | A pointer to a regexProbeObject type. |
hash | arg1 | The computed hash value for the RKRegex object. |
ranges | arg2 | A pointer to an array of NSRange that will contain the results of the match. |
rangeCount | arg3 | The number of valid ranges, usually equal to the number of captures for the regular expression. |
charactersBuffer | arg4 | A pointer to the buffer containing the bytes to perform the match on. |
length | arg5 | The length, in bytes, of charactersBuffer. |
searchRange | arg6 | A pointer to a NSRange containing the range within charactersBuffer to perform the match on. |
matchOptions | arg7 | The RKMatchOption match options. |
errorCode | arg8 | If the match was successful, errorCode contains the number of ranges that contain valid results. Otherwise, errorCode contains the RKMatchErrorCode error code. |
errorCodeCharacters | arg9 | A pointer to a NULL terminated string of the text of the RKMatchErrorCode error code. |
The following probe fires if a regular expression match generates an exception.
Name | Argument | Description |
---|---|---|
probeObject | arg0 | A pointer to a regexProbeObject type. |
hash | arg1 | The computed hash value for the RKRegex object. |
ranges | arg2 | A pointer to an array of NSRange that will contain the results of the match. |
rangeCount | arg3 | The number of valid ranges, usually equal to the number of captures for the regular expression. |
charactersBuffer | arg4 | A pointer to the buffer containing the bytes to perform the match on. |
length | arg5 | The length, in bytes, of charactersBuffer. |
searchRange | arg6 | A pointer to a NSRange containing the range within charactersBuffer to perform the match on. |
matchOptions | arg7 | The RKMatchOption match options. |
exceptionNameCharacters | arg8 | A pointer to a NULL terminated string of the name of the exception. |
reasonCharacters | arg9 | A pointer to a NULL terminated string for the reason of the exception. |
The following probes fire at the start and end of a cache lookup.
Name | Argument | Description |
---|---|---|
cache | arg0 | A pointer to the RKCache object. |
description | arg1 | A pointer to a NULL terminated string containing a description of the RKCache object. |
lookupObjectHash | arg2 | The hash value of the object to look up in the cache. |
lookupObjectDescription | arg3 | A pointer to a NULL terminated string containing a description of the object to look up, if available. |
shouldAutorelease | arg4 | If the requested object is in the cache, it is always sent a retain message while the cache is locked. If shouldAutorelease is set to 1, the cache will send a autorelease message to the retrieved object as well. |
isCacheEnabled | arg5 | Set to 1 if the cache is currently enabled, 0 otherwise. |
cacheHits | arg6 | The number of times an object was found in the cache since the last reset of the counters. |
cacheMisses | arg7 | The number of times the cache was unable to fulfill a lookup request with a cached result since the last reset of the counters. |
currentCount | arg8 | The number of objects currently in the cache. |
cachedObject | arg9 | Set to NULL if the cache was unable to fulfill the look up request with a result from the cache, other cachedObject contains a pointer to the object found in the cache. |
The following probes fire at the start and end of adding an object to a cache.
Name | Argument | Description |
---|---|---|
cache | arg0 | A pointer to the RKCache object. |
description | arg1 | A pointer to a NULL terminated string containing a description of the RKCache object. |
addObject | arg2 | A pointer to the object to add to the cache. |
addObjectHash | arg3 | The hash value of the object to add to the cache. |
addObjectDescription | arg4 | A pointer to a NULL terminated string containing a description of the object to add, if available. |
isCacheEnabled | arg5 | Set to 1 if the cache is currently enabled, 0 otherwise. |
currentCount | arg6 | The number of objects currently in the cache. |
didCache | arg7 | Set to 1 if the object did not already exist in the cache and was successfully added, 0 otherwise. |
The following probes fire at the start and end of removing an object from a cache.
Name | Argument | Description |
---|---|---|
cache | arg0 | A pointer to the RKCache object. |
description | arg1 | A pointer to a NULL terminated string containing a description of the RKCache object. |
removeObjectHash | arg2 | The hash value of the object to remove from the cache. |
isCacheEnabled | arg3 | Set to 1 if the cache is currently enabled, 0 otherwise. |
currentCount | arg4 | The number of objects currently in the cache. |
removedObject | arg5 | If the object was unable to be removed, or did not exist in the cache, removedObject contains a pointer to the removed object. Otherwise removedObject contains NULL. |
removedObjectDescription | arg6 | A pointer to a NULL terminated string containing a description of the object to add, if available. |
The following probe fires whenever the statistic counters for a cache are cleared. The probe provides the cache hit and miss counts just prior to clearing the counters.
Name | Argument | Description |
---|---|---|
cache | arg0 | A pointer to the RKCache object. |
description | arg1 | A pointer to a NULL terminated string containing a description of the RKCache object. |
didClearCache | arg2 | Set to 1 if the cache was successfully cleared, 0 otherwise. |
cacheClearedCount | arg3 | The number of times the cache was cleared. |
preClearCacheHits | arg4 | The number of times an object was found in the cache before the counter was cleared. |
preClearCacheMissed | arg5 | T number of times the cache was unable to fulfill a lookup request with a cached result before the counter was cleared. |
The following probes fire at the start and end of an attempt to acquire a multithreaded lock. The Unlock probe fires when a thread relinquishes a lock.
Argument | Argument | Description |
---|---|---|
lock | arg0 | A pointer to the RKLock object. |
forWriting | arg1 | Set to 1 if the request is a write lock, 0 otherwise. |
isMultithreaded | arg2 | Set to 1 if the lock has switched to full multithreading mode, otherwise set to 0 if the lock still in single threaded performance mode. |
acquiredLock | arg3 | Set to 1 if the the lock was successfully acquired, 0 otherwise. |
spinCount | arg4 | The number of times that an attempt was made to acquire the lock, but the lock was busy and unavailable. |
The following is an example of a dtrace script file. It uses the aggregation variable type to record the number of times a select number of RegexKit probes fire. Then, once per second, the script outputs the current counts and resets the aggregation variables to zero and begins counting again.
An example dtrace script file, perSecond.d :
The example script above can be copy and pasted in to a shell, and then executed with the following command:
It is important to note that in the above example output, the executable to be traced was already executing before the dtrace command was executed. This illustrates the dynamic part of dtrace. The probe points are always active in the RegexKit framework, and they can be enabled on the fly at any time.
You can also specify probes to match as an argument to the dtrace command. The following two examples demonstrate the aggregation functionality of dtrace by counting the number of samples within an aggregation bin.
The first example measures the amount of time, in microseconds, it takes to perform a lookup in the cache. The left hand side is the number of microseconds, from zero to 20, an the right hand side is the number of samples counted for each sample bin. The majority of samples are in the 4 to 6 microsecond range.
Next, the amount of time to compile a regular expression is graphed with the same units of measurements (zero to 20 microseconds).
It's easy to see that the majority of samples for compiling a regular expression are in the 14 microsecond range, and the distribution is much more spread out compared to the cache lookup distributions. This also clearly demonstrates the usefulness of the RegexKit cache which is nearly three times faster.
RegexKit includes a number of instruments tailored to RegexKit for Instruments.app. These are installed in /Developer/Library/Instruments/PlugIns automatically if Instruments.app is installed.
Instrument | Description |
---|---|
Cache Lookup Timing | Records the time it takes to retrieve a regular expression from the cache in microseconds. |
Collection Cache | Records the effectiveness of the Least Recently Used negative-hit sorted regex collection cache. |
Collection Timing | Records the time it takes to determine if a regular expression in a collection matches a target string in microseconds. |
Compile Errors | Records regular expressions that failed to compile due to an error. |
Compile Timing | Records the time it takes to compile a regular expression in microseconds. |
Lock Timing | Records timing information for multithreaded locks in microseconds. |
Match Errors | Records matches that result in an error. |
Match Timing | Records the time it takes to perform a match in microseconds. |
Per Second | Records per second statistics. |
Performance Notes | Records potential performance problems that the framework has detected. |
The use of Instrument.app will not be covered here. None the less, using the provided instruments should be straight forward. Instrument.app also allows for the creation of your own DTrace scripts so you can create or modify scripts to extract the information that you require.
Adding the framework to your project is fairly straight forward. These directions cover adding the framework to your project as an embedded private framework. An embedded private framework is just like a standard framework, such as Cocoa, except that unlike Cocoa, a copy of the embedded private framework is included inside your applications .App bundle in the My App.App/Contents/Frameworks directory.
Your applications executable file, which is in the My App.app/Contents/MacOS directory, is then dynamically linked to the embedded private framework. The linker records that the path to the embedded private framework, and therefore the shared library that contains the code for the framework, exists within the applications bundle. Then, when your application is executed, the dynamic linker knows to find the frameworks shared library in the applications bundle and not the standard framework search paths, such as /System/Library/Frameworks or /Library/Frameworks.
The following outlines the steps required to use the framework in your project.
Using the framework requires that you link your application to it and copy it in to your applications bundle. Figure 1 shows a typical new application in Xcode.
You link to the framework as follows:
Add the framework to the resources that Xcode is aware for your application by expanding the Frameworks group. Then, right-click on Linked Frameworks and choose as shown in Figure 2.
Choose /Developer/Local/Frameworks/RegexKit.framework. Xcode will then ask which targets to add the framework to. Select your application if it is not already selected. When you have selected all the targets you would like to add the framework to, click the Add button. The RegexKit.framework should now appear within the Linked Frameworks group. Additionally, the framework should automatically appear under the Link Binary With Libraries build phase for your application as shown in Figure 3.
Next, you will need to add a Copy Files build phase to your applications target.
Within the Targets group, right-click on your application and choose as show in Figure 4.
A window titled Copy Files Phase for "Your Application" Info will appear. Choose from the pop-up menu leaving the Path field empty and the Copy only when installing checkbox deselected. The window should now look like Figure 5. When finished, close the window.
Finally, add the RegexKit.framework to the files to be copied. Choose the RegexKit.framework from Frameworks > Linked Frameworks and drag it to the newly created Copy Files build phase as shown in Figure 6.
For each of your fileName.m files that makes use of RegexKit.framework functionality, you will need to add a statement to include the RegexKit.h header. This is normally accomplished by adding the statement #import <RegexKit/RegexKit.h> to fileName.h. For example:
Optionally, although recommended, you can add the RegexKit.h header to the list of headers that Xcode precompiles. This can reduce compile times because the header is processed only once ahead of time, instead of each time that it is imported. By default, Xcode creates a file called Application_Prefix.pch that is within the Other Sources group. To include the RegexKit.h header in the header files that Xcode precompiles, you need to add a #import <RegexKit/RegexKit.h> statement to Application_Prefix.pch. A typical file would look something like:
Clean any targets that you have made changes to. The easiest way to do this is to clean all the targets by choosing Also Clean Dependencies and Also Remove Precompiled Headers checkboxes in the dialog that appears.
from the menu bar and then selecting theRebuild the Code Sense Index. In order to make sure that Xcodes Code Sense feature includes the definitions from RegexKit.framework, it's a good idea to rebuild the Code Sense Index. From the menu bar, choose and click on the General tab in the window that appears. Then, within the General pane, click on the button that is near the bottom.
Your application is now set up to use the framework. When you compile your application, Xcode will copy all the files necessary to use the RegexKit.framework in to your applications bundle.
The code for this framework is licensed under what is commonly known as the revised, 3-clause BSD-Style license.
Copyright © 2007-2008, John Engelhart
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.