
The release of Delphi 2 generated a lot of hype. 32-bit applications, no 64 KB limits, Win95 controls, long filenames, and registry access. I too was taken up by the hype. I was so excited by the release of Delphi 2 that I bought it the day it came out. When I got home, I threw the manuals aside to experiment with all the new features instead. In fact, it was several months before the hype wore off, I stopped my experiments, and decided to pick up the manuals to learn about the new, but less obvious features in Delphi 2.
One of the new features I was aware of was the inclusion of a new long-string type, the AnsiString. However, I didnt have any need for them in my early experimentation, so I decided to forgo them and use the standard string type. Imagine my surprise when I found out I had been using AnsiStrings all along, and never even noticed. They operated identical to standard strings, and even worked with the same processing functions. Of course, I wasnt able to harness their full power without knowing all about their special features, but I was able to use them nonetheless, without any knowledge of it.
While AnsiStrings can be transparent to use, there are many features that arent so obvious. Some of these features dont behave as you might first expect, either. Before being able to use AnsiStrings to their full advantage, you need to know about their structure and features, as well as how the string manipulation routines work with them. In this article, we will explore these areas of AnsiStrings to see what they can and cant do, and how and why they function as they do. With a good understanding of AnsiStrings, you should be able to predict how they will function in most situations so you can use them appropriately.
Introduction to String Types
In Delphi 1, there were 2 basic string types
Strings (now known as and referred to here out as ShortStrings):
Advantages:
- Simple assignments and concatenations are possible, without the need for any function calls.
- Native string type of the VCL.
- Memory allocation and disposal is automatic.
Disadvantages:
- Memory is fully allocated immediately, not as needed.
- Maximum length of 255 characters.
- Cannot be directly converted or typecasted into a PChar. Conversion requires explicitly allocating and releasing memory for the PChar, as well as calling the function to perform the assignment.
- Cannot be used to directly interface with the Windows API.
PChars (also known as C-Strings):
Advantages:
- No practical maximum length.
- Memory can be efficiently allocated only as needed.
- Native string type of the Windows API.
Disadvantages:
- No simple assignments or concatenations are possible. Functions must be used to perform these operations.
- Memory must be allocated and released manually.
- Cannot be converted or typecasted into a ShortString. Functions must be used instead.
- Cannot be used to directly interface with the VCL.
Clearly, one string type cannot be used exclusively without sacrificing access to either the VCL or the Windows API. There wasnt any type of string that could fill the gap between the 2, combining the advantages of both types. Delphi 2 has done much to change this. For example, PChars can now be assigned directly to ShortStrings. However, the biggest improvement to Delphis string processing is the introduction of a third string type, the AnsiString.
AnsiStrings (also known as Long Strings):
Advantages:
- Simple assignments and concatenations are possible.
- Memory allocation and disposal is automatic, and is efficiently allocated only as needed.
- No practical maximum length.
Disadvantage:
- Requires 8 more bytes than a PChar.
AnsiStrings also have 2 other unique features:
- Can be directly assigned and typecasted to ShortStrings and PChars. This allows you to use AnsiStrings to interface with both the VCL and the Windows API.
- Multiple AnsiStrings with the same value can safely share memory when possible, reducing memory usage, and generally increasing application performance. This will be described more completely under reference counting, below.
Structure of String Types
ShortString
A ShortString variable is an array of up to 256 chars. Elements [1] to [255] contain the actual characters of the string, while element [0] contains the length byte. The length byte indicates how many of the 255 characters are actually part of the string.
PChar
PChars are not arrays of chars like ShortStrings. Rather, a PChar is a pointer to a dynamically allocated array of chars. Since the array is dynamically allocated, the maximum size can also be dynamically adjusted, allowing memory to be conserved, and allocated only as needed. The characters in the array pointed to by a PChar begin in element [0] rather than element [1] as in a ShortString. This is because there is no length byte in a PChar. Instead, the string in a PChar is terminated by a null zero (ASCII #0). The size is determined by counting the number of characters before the null zero.
AnsiString
AnsiStrings are a hybrid format of a PChar and a ShortString. Like a PChar, the AnsiString variable actually holds a pointer to a dynamically allocated, dynamically sized array of chars terminated by a null zero. However, the AnsiString is like a ShortString from the aspect that the characters begin at offset [1] rather the [0]. The AnsiString has no length byte in element [0], however, so the number of valid characters in the string is only determined by the position of the null zero. In addition to the characters and null zero stored in the positive offset elements, an AnsiString also has information stored at negative offsets. The DWord at offset -4 is the length value. The DWord at offset -8 is the reference count. You may also hear or read that offset -12 contains an allocated length value. However, this is not used by the AnsiString itself. Rather it is stored and used by the memory management routines.
Length Value
It is often a misconception that the length value represents the length of the strings, as is the case with ShortStrings. However, this is not the case with AnsiStrings. Rather, the Length Value contains the maximum length string that can be stored in the allocated memory space (not counting the null zero terminator). In general, because of the way AnsiStrings are processed, the maximum length is usually the same as the actual string length. However, several operations, such as SetLength and individual character manipulation, can change either the actual string length or maximum length without changing the other. Since the Length Value and the actual string length may be different,
Length(MyAnsiString)
should only be used to get the maximum length of the AnsiString. To get the actual string length, you should use
Length(PChar(MyAnsiString))
This will return the actual string length since the Length function works as expected on a PChar. In addition, you can also manually set the max length of an AnsiString. This is done by using the SetLength procedure. For example,
SetLength(MyAnsiString, 15)
will set the maximum length of MyAnsiString to 15, and place a null zero in element [16]. SetLength can be useful when manipulating a string through a PChar typecast, which is described below.
Reference Count
A new feature unique to the AnsiString is the reference count. Since an AnsiString is just a pointer to a memory location, it is possible, under certain circumstances, for 2 AnsiStrings to point to the same block of memory. This is the idea behind reference counting. The reference count indicates how many AnsiStrings are currently sharing that block of memory. When a new AnsiString is allocated, it starts with a reference count of 1. If that AnsiString is then assigned to a second AnsiString, the second AnsiString will point to the same block of memory as the first, and the reference count will now be 2. Since there are now 2 AnsiStrings sharing the same block of memory, modifying one would appear to change the other AnsiString, too. Luckily, the AnsiString processing functions are smart enough to prevent this from happening. Any attempt to modify an AnsiString involves the following steps:
If the reference count of the string is 1 then modify the string. Otherwise do the following:
- Make a new copy of the string, which will automatically have a reference count of 1.
- Decrement the reference count of the original string (unless -1).
- Finally, perform the modification.
So, even if 2 AnsiStrings share the same block of memory, it is safe to modify one without fear of changing the other. This process of ensuring that only the intended AnsiString is modified is known as copy-on-write. Note that when a new AnsiString is allocated, it has a reference count of 1. This count is incremented for every AnsiString that shares that same block of memory, and decremented for every AnsiString that is either copied to a new location , destroyed, or goes out of scope. An AnsiString is destroyed when the object it belongs to is destroyed. An AnsiString declared local to a function or procedure goes out of scope when that function or procedure exits, even if exited because of an exception. When the reference count of an AnsiString is decremented to 0, all memory allocated with it is released.
In Delphi 2, string constants and literals are stored in the AnsiString format. To prevent these constants and literals from ever being modified, they are given a reference count of -1. A reference count of -1 is never incremented or decremented, and therefore, the memory associated with it is never released, which is correct since it was never really allocated. When an attempt to modify an AnsiString with a reference count of -1, the AnsiString is always copied to a new location (with a reference count of 1).
The use of reference counting has an additional benefit besides conserving memory. Through the course of an average program, strings are assigned to one another and never modified several times. Through reference counting, there is a performance gain in these situations, since the string characters are not copied to a new location. Only the pointer needs to be copied, and the reference count incremented. So the implementation of reference counting increases performance by reducing memory copies while ensuring safe manipulation of strings.
Typecasting / Assignments
One nice new feature of Delphi 2 is its improved capability for string assignments and typecasting (see figure 1). In Delphi 2, almost any type of string can be assigned or typecasted to another type, without need of any special processing or memory management on the programmers part. The only exceptions are that AnsiStrings can only be typecasted, not assigned, to a PChar, while ShortStrings can be neither assigned nor typecasted to a PChar. However, it would still be possible to typecast a ShortString to an AnsiString, and then to a PChar, such as:
MyPChar := PChar(AnsiString(MyShortString));

Figure 1 - String Type Compatibility
When typecasting or assigning one string type to another, there are certain details you should be aware of.
When assigning or typecasting a ShortString or a PChar to an AnsiString, a new AnsiString is allocated, and the source string is copied to it. This means that any modifications to the AnsiString will not affect the source string. So in the example above, in which the ShortString is typecasted to an AnsiString and the to a PChar, any modification of the PChar would not affect the ShortString.
When assigning or typecasting a PChar or an AnsiString to a ShortString, the source string is copied to the ShortString, and truncated to a length of 255, if necessary.
When typecasting an AnsiString to a PChar, there are several important details. First, no new memory is allocated, only a pointer is returned. If the AnsiString is nil (it has a maximum length of 0), a non-nil pointer to a null zero is returned. Otherwise, a pointer to element [1] of the AnsiString is returned, which will become element [0] of the PChar. It is worth noting here that since an AnsiString is stored as a pointer, it can be typecasted to a Pointer type, as well as a PChar. However, if the AnsiString is nil (max length of 0), the Pointer typecast will return nil, unlike the PChar typecast.
Also, when using a PChar typecasted from an AnsiString, there are certain conditions you need to ensure. First, the AnsiString from which the PChar was typecasted must not be modified directly from the time the typecast is performed until the PChar is no longer needed. Otherwise, you run the risk that the string will be moved to a new memory location, and the PChar will no long point to the correct memory block. If the PChar will be used as read-only, there is nothing else special you need to do. However, if you wish to modify the string via the PChar, there are a few more conditions.
First, the AnsiString must be long enough (maximum length large enough) to contain any modifications that will be made. This means that if the PChar will be used to lengthen the string, you should use SetLength first. For example, if the PChar will be passed to a procedure which will fill the string will the current date in the MM/DD/YYYY format, it might be wise to do
SetLength(MyAnsiString, 10)
before making the call. Remember, the maximum length of an AnsiString does not count the null zero at the end, so a max length of 10, not 11, would be sufficient in this case. Second, the AnsiString must have a reference count of 1. This means 2 things. One, the AnsiString must not be sharing memory with another AnsiString. Otherwise, the PChar modification would modify all strings sharing that memory block. In rare cases, this might be desirable, but more often than not, it would be a chance to introduce hard to find bugs. Two, the AnsiString must not point to a constant or literal, which have reference counts of -1. This is because modifying a constant or literal would not make sense, and would usually generate a run-time error. In either case, there are several ways to ensure that the reference count is always 1, including calls to SetLength or SetString. However, the easiest way is to call UniqueString before performing the typecast.
UniqueString(MyAnsiString);
MyPChar := PChar(MyAnsiString);
Once the typecast has safely been performed, you must also ensure that the AnsiString maintains its reference count of 1 until the PChar is no longer in use. This means it must not be assigned to another AnsiString, which would then share the same the same block of memory. Finally, after modifying an AnsiString through a PChar typecast, you must also ensure that the maximum length and the string length are equal. This can be done with either the following call to SetLength or the following assignment:
SetLength(MyAnsiString, Length(PChar(MyAnsiString)));
OR
MyAnsiString := PChar(MyAnsiString);
If you forget to do this step, you could run into a problem when modifying the AnsiString later. For example, suppose MyAnsiString has a max length of 20 and a string length of 15. This would mean that there is also a null zero in element [16]. If you now append the string ?, it be appended after the maximum length of MyAnsiString, not after the null zero. So, MyAnsiString will have a maximum length of 21, with the ? in element [21] and a null zero in element [22], as well as in element [16]. This creates two problems. Not only is the ? not in the position you wanted, but it is also after a null zero. This means that when you use the string, you will never see anything after the first null zero, not even the ?. The simple solution to this problem is to make the call to SetLength or to make the assignment, as shown above.
Concatenation of 2 Strings
When concatenating 2 strings, several steps take place. First, any explicit typecasting is performed. Then, using the new types, both strings are typecasted to a single string type, which will also be the result type. If either of the strings is an AnsiString, the result type will be an AnsiString. If neither of the strings is an AnsiString but one of them is a ShortString, the result type will be a ShortString. If both strings are PChars, then a compile-time error will be raised. In this case, at least one of the PChars must first be explicitly typecasted to either an AnsiString or a ShortString.
MyAnsiString := AnsiString(MyPchar1) + MyPChar2;
Optionally, you could, instead, add a 3rd string at the beginning of the concatenation. This string would simple be ''.
MyAnsiString := '' + MyPChar1 + MyPChar2;
Since the first string is an AnsiString by default, the compiler will automatically typecast the 2 PChars to AnsiStrings. Finally, both strings are concatenated to a string of the result type. If the result type is a ShortString, the result will also be truncated to a length of 255, if necessary.

Figure 2 - String Concatenation
Concatenation of 3 or More Strings
When 3 or more strings are concatenated, they will be concatenated 2 at a time, left to right, with innermost parenthesis first. For this reason, it is important to pay careful attention to the ordering of the string types. For example, take 3 strings, SS1 and SS2 of type ShortString, and AS1 of type AnsiString, and assign a 200 character string to each of them. Now observe the result of the following:
MyAnsiString := AS1 + SS1 + SS2;
This will result in an AnsiString with a length of 600 being assigned to MyAnsiString. You can see this by examining the intermediate concatenations. AS1 + SS1 will result in an AnsiString of length 400 (200 + 200). The concatenation of that result with SS2, of length 200, will result in an AnsiString of length 600 (400 + 200), which is then assigned to MyAnsiString. Now observe the result of this second assignment.
MyAnsiString := SS1 + SS2 + AS1;
At first glance, this may appear to have the same result, but it doesnt. It will result in an AnsiString of length 455 being assigned to MyAnsiString. Again, this can be seen more clearly by examining the intermediate concatenations. First, SS1 + SS2 would appear to result in a string of length 400. However, both types are ShortStrings, so the result will be a ShortString, and will therefore be truncated to length 255. That will then be concatenated with AS1. Now, since one string, AS1, is an AnsiString, the result will be an AnsiString, and will have a length of 455 (255 + 200). At first, this may not be apparent, so it is very important to pay careful attention to the ordering of string types in concatenations of 3 or more strings. Since you will usually want to perform this type of concatenation without any truncation taking place, it might be a good idea to always typecast the first string to type AnsiString. This way, the result of every intermediate concatenation will always be an AnsiString, and you will get the final result without any truncation having ever taken place.
Conclusion
Clearly, the enhanced string support in Delphi 2, particularly the AnsiString, is very powerful. The AnsiString is a powerful string type, combining the power and flexibility of a PChar with the ease of use of a ShortString. This combination combines the advantages of both types, eliminating each others disadvantages. The only disadvantage of the AnsiString, the extra 8 bytes needed, is negligible, especially when you consider the new features unique to the AnsiString. Adding to this power is the fact that you can use AnsiStrings with little knowledge of the way they operate. However, it isnt until you fully understand them that you can reap the full benefits of the AnsiString and begin to fully appreciate just how powerful it really is.