OpenGATE Contents | GATE development concepts: Strings
String and text representation
Problem description
Managing Text is a major feature all programming languages and developemnt
frameworks should provide. Character strings (or just STRING
variables)
are present in all C and C++ libraries, but it is highly implementation
specific, how their content is encoded and what dependencies are required.
- C character arrays need to be NULL-Character terminated, which leads to buffer overruns when this requirement is not fulfilled.
- Allocation of C like strings is library dependent. It is required to copy string contents to own them.
- C++
std:string
class has no stable API and is therefore incompatible between compiler vendors, releases and builds likeDebug
orRelease
. - C++
std::string
instances are mutable and can be altered at any time which makes references to their content invalid (e.g.:stdstr.c_str()
).
Solution
The GATE Framework defines it own string structure (gate_string_t
) to ensure
a stable and interchangable API which can project string contents between
multiple languages and C or C++ dialects.
- GATE strings are consecutive BYTE characters and the managing structure
stores a pointer to the first character and a length value.
Such strings do not need to end with a NULL character. If a NULL character is attached to BYTE buffer, it is NOT counted in the length value. gate_string_t
instances can manage static or external data by just holding the pointer-length pair. And they can contain an additional reference counted string-buffer that keeps track of the allocated content.- Strings are dynamically created within a String Builder instance
(
gate_strbuilder_t
) where any kind of manipulation and appending of content is fully supported. When all required manipulations are applied, the results of a string builder can be transferred into a GATE string structure. - All GATE string instances are immutable and are not allowed to be modified after their creation. The only legal access direction is to read their content.
- The one and only text encoding within a GATE string structure is UTF-8. When ever encoding or formating operations are required, the contents of a string needs to be UTF-8 encoded starting at its construction. Notice: There is no separate validation of input data for strings to improve performance. Bytes are just taken and processed. But when it comes to string conversion or operating system communication, only UTF-8 contents can be treated correctly.
- String contents can be shared by following methods:
- copy creation: dynamically allocates a new byte-by-byte copy of the source string
- cloning: shares a dynamically create string by incrementing its
reference counter
OR: creates a new dynamically allocated string in case of an unmanaged source - duplication: Just duplicates the string reference of the source. If the source was dynamically, the duplicate shares the content by reference counting. If the source was unmanaged, the duplicate just copies the unmanaged pointers without any further handling.
- It is explicitely allowed to create new shared string references of
existing ones, where the new string only references a subset from the
full string buffer (but holds a reference count to the full buffer).
e.g.:gate_string_substr()
does no unnecessary copying, it shares the string-buffer of its source but updates its pointer-length pair to address only the desired part of the original string.
C Example
1#include <gate/strings.h> 2 3int main() 4{ 5 gate_strbuilder_t builder = GATE_INIT_EMPTY; 6 gate_string_t dynamic_text = GATE_INIT_EMPTY; 7 gate_string_t suffix_text = GATE_INIT_EMPTY; 8 gate_size_t position; 9 /* static non-allocated string: */ 10 gate_string_t static_text = 11 GATE_STRING_INIT_STATIC("world"); 12 13 /* build dynamic string-buffer: */ 14 gate_strbuilder_create(&builder, 0); 15 gate_strbuilder_append_cstr(&builder, "Hello "); 16 gate_strbuilder_append_string(&builder, &static_text); 17 gate_strbuilder_append_cstr(&builder, " from "); 18 gate_strbuilder_append_int32(&builder, 42); 19 gate_strbuilder_append_cstr(&builder, " other realms"); 20 21 /* use dynamic string: */ 22 gate_strbuilder_to_string(&builder, &dynamic_text); 23 gate_strbuilder_release(&builder); 24 25 position = gate_string_pos(&dynamic_text, &static_text, 0); 26 if(position != GATE_STR_NPOS) 27 { 28 gate_string_substr(&suffix_text, &dynamic_text, 29 position + gate_string_length(&static_text), GATE_STR_NPOS); 30 } 31 32 /* cleanup */ 33 gate_string_release(&suffix_text); 34 gate_string_release(&dynamic_text); 35 36 return 0; 37}
C++ Example
1#include <gate/strings.hpp> 2 3int main() 4{ 5 using namespace gate; 6 7 static String const staticText = 8 String::createStatic("world"); 9 StringBuilder builder; 10 11 builder << "Hello " << staticText << 12 << " from " 42 << "other realms"; 13 14 String dynamicText = builder.toString(); 15 size_t position = dynamicText.positionOf(staticText); 16 if(position != String::npos) 17 { 18 String suffix = dynamicText.substr( 19 position + staticText.length()); 20 } 21 22 return 0; 23}