genie-logo

Genie strings

Page updated October 27, 2009
There is not yet an official logo for Genie/Vala, above image just a placeholder.

Introduction

I have introduced the string datatype in my page Genie data types.

Read that first. This page continues on, with a focus on using the string functions.

The main places to look for official documentation:

http://references.valadoc.org/glib-2.0/string.html
http://library.gnome.org/devel/glib/stable/glib-String-Utility-Functions.html
/usr/share/vala/vapi/glib-2.0.vapi

These are the available functions:
canon, chomp, chr, chug, compress, concat, contains, down, escape, has_prefix, has_suffix, len, ndup, printf, replace, reverse, scanf, split, str, strip, substring, to_double, to_int, to_int64, to_long, to_ulong, up

Summary of string functions
canon

chomp
Removes trailing whitespace from string
chr

chug
Removes leading whitespace from string
compress

concat
Append one string to another
contains
Tests if substring is in string
down
Convert all letters to lower-case
escape

has_prefix
Looks whether string begins with prefix
has_suffix
Looks whether string ends with suffix
len
Returns the length of the string
ndup
Duplicates the first n bytes of a string
printf

replace
Replace a substring in string
reverse
Reverses string, 'abcde' becomes 'edcba'
scanf

split
Using a delimiter, splits into array of strings
strip
Removes leading and trailing whitespace from string
substring
Obtain substring at offset in string
to_*
Converts string to a scalar datatype
up
Convert all letters to upper-case

A simple example

Trying out a few of the functions...
init
var s = "The quick brown fox"
s = s.concat(" jumped over the")
s += " lazy dog"
var s2 = s.ndup(s.len())
if s == s2 do print "s and s2 have the same strings"
print s2
Notice that you don't use a compare function to test equality, use the generic == operator. Or use != to test if not equal.

A real application

I would like to show usage of the string functions to solve a real problem. In Puppy Linux we have a file /etc/rc.d/PUPSTATE, that has entries like this:
#The partition that has the pup_save file is mounted here...
PUP_HOME='/mnt/dev_save'
This was designed to be inserted into a Bash script, which is a one-liner:
. /etc/rc.d/PUPSTATE
Bash and friends, being interpreted systems, are able to evaluate source code at runtime, which effectively is what the "." operator does.

Genie being a pure compiled language does not offer runtime evaluation of source code, so some kind of more elaborate code is going to be needed to achieve the same end.

This is long-winded, as we have to do some limited parsing of the source code in the PUPSTATE file...
[indent=4]
init
var PUPSTATE = new dict of string,string
var f = FileStream.open("/etc/rc.d/PUPSTATE","r")
var a = new array of char[128]
while f.gets(a) is not null /*read one line from file*/
a[a.length - 1] = 0 /*make it null-terminated*/
var s = (string)a /*caste array-of-char to a string*/
s.strip()
if s.has_prefix("#") == true do continue
var s2 = s.split("=",2) /*returns array of string*/
PUPSTATE[s2[0]] = s2[1] /*add to dictionary*/

for o in PUPSTATE.keys do print("%s = %s", o, PUPSTATE[o])
The variables have been read from /etc/rc.d/PUPSTATE and are now in a dictionary named PUPSTATE. That's fine. For example, one of the lines in /etc/rc.d/PUPSTATE is "PUPMODE=12" so now we have that as key:value entry in the dictionary, and it is easy enough to access that later in the program:
    if PUPSTATE["PUPMODE"] == "12" do print "mode is 12"
But, an interesting question. Is it possible to add to the above example so that variables are actually created? Yes...
    var PUPMODE = PUPSTATE["PUPMODE"].to_int()
print "%d", PUPMODE
Homework exercise
Rewrite the above example so that the dictionary is not required at all. All of the variables are read from /etc/rc.d/PUPSTATE and assigned as variables in the program, with correct int and string datatypes.

Syntax note for Genie newbie
This PUPSTATE["PUPMODE"].to_int() may seem strange. The function to_int() converts a string to an integer, but you are so far familiar with seeing code like s.to_int(). However, Genie will accept anything on the left of the dot that resolves to a string. PUPSTATE["PUPMODE"] returns a string, so that's fine.

So far in this page we have worked with plain-vanilla C strings. However, there are some strings-on-steroids, known as GStrings...

The StringBuilder class

According to what I can glean from the Vala docs, strings are essentially immutable. That is, when created it occupies a certain amount of memory, and that's it. You can't just append to it, because it's sitting there in memory with other stuff "either side".

Oh, but you can resize a string. These two ways are equivalent:
	s:string = "abc"
s = s + "xyz"
s = s.concat("xyz")
But the thing is, the old memory allocation has to be deleted and a new memory allocation made. This is very slow.

If your program has to do a lot of string resizing, especially if in a loop, then there is a special string class called StringBuilder, which does it faster.

StringBuilder is actually a frontend for the String class in Glib. When a String is instantiated, it is called a GString. The Glib docs has this to say about a GString:

A GString is similar to a standard C string, except that it grows automatically as text is appended or inserted. Also, it stores the length of the string, so can be used for binary data with embedded null bytes.

Online documentation:
http://references.valadoc.org/glib-2.0/GLib.StringBuilder.html
http://library.gnome.org/devel/glib/stable/glib-Strings.html

These are the available functions:

append, append_c, append_len, append_printf, append_unichar, assign, erase, insert, prepend, prepend_c, prepend_len, prepend_unichar, printf

Summary of StringBuilder functions
append
Appends a string onto end of a GString
append_c
Appends a byte onto end of a GString
append_unichar
Converts a unicode char to UTF-8 and appends it
append_len

append_printf
Appends a formatted string onto end of GString
assign
Copy a string into GString, overwriting original
erase
Erase part of the GString
insert
Insert another string into GString
prepend_* Ditto as append_*, but at start of GString
printf
Inserts a formatted string into GString

The idea here is that you can create a GString of the StringBuilder class, do all the resizing that you want, then access it as a normal C string to be able to use any of the functions further up this page ...with care though... here is an example, then I'll explain further...
init
var b = new StringBuilder
b.append("quick fox")
b.prepend("The ")
b.insert(10,"brown ")
print b.str
Easy enough, but pay special attention to that last line. The string is printed, but str is not a function, it is a field of a structure. That's because a GString is actually a structure:

GString structure
The Glib documentation (see above URL) defines a GString as a struct:
typedef struct {
gchar *str;
gsize len;
gsize allocated_len;
} GString;
The variable b is an instantiation of this structure, and b.str is the address of the actual string -- furthermore, that is a normal C null-terminated string.
So, GString is just a wrapper, with a normal C string inside!

The reason that I mentioned you need to be careful, is you shouldn't use the normal C functions like concat() to resize the C string, use the StringBuilder functions for that. Function concat() will cause the C string to get relocated, so the GString structure would no longer be valid.

It follows that you can get the length of the GString like this:
	x:long = b.len

The Regex class

Those of us with a Linux/Unix shell scripting background will be very familiar with utilities like grep and sed, that do string processing with regular expressions. Regular expressions are so useful, that Puppy Linux has a special regular expression help page -- click the "Help" entry in the menu.

We can also use regular expressions in Genie. Here are online docs:
http://references.valadoc.org/glib-2.0/GLib.Regex.html
http://library.gnome.org/devel/glib/stable/glib-Perl-compatible-regular-expressions.html

Here is a simple example...
init
var r = new Regex ("jaguar|tiger|leopard")
var s = "wolf, tiger, eagle, jaguar, leopard, bear"
s = r.replace(s, s.len(), 0, "pussy")
print s
We can also just look for a match and return a boolean true/false:
init
var DB_description = "A utility to monitor serial I/O"
var r = new Regex (" system | print | printing | process | hardware | monitor",RegexCompileFlags.CASELESS)
if r.match(DB_description) do print "System category"

Notice a difference where the regular expression object 'r' is defined: it can take an extra parameter. These are flags, and CASELESS means the comparison will ignore the case of letters. Here are all the possible flags:

        CASELESS,
        MULTILINE,
        DOTALL,
        EXTENDED,
        ANCHORED,
        DOLLAR_ENDONLY,
        UNGREEDY,
        RAW,
        NO_AUTO_CAPTURE,
        OPTIMIZE,
        DUPNAMES,
        NEWLINE_CR,
        NEWLINE_LF,
        NEWLINE_CRLF




Examples

What follows are examples to show various string operations and usage of the functions.

'contains' function

I had a need to read a string from the commandline and find out if it exists within another string, that is, is it a substring:
init
var DB_nameonly = args[1]
var PKG_CAT_Desktop = " blackbox compiz desk_icon_theme_browndust desk_icon_theme_darkfire desk_icon_theme_original e16 fbpanel fluxbox fvwm gfontsel glipper gtk-chtheme gtk_theme_citrus_cut gtk_theme_fishing_the_sky gtk_theme_fishpie gtk_theme_gradient_brown gtk_theme_gradient_grey gtk_theme_m8darker gtk_theme_phacile_blue gtk_theme_polished_blue gtk_theme_stardust_zigbert gxset icewm jwm2 jwmconfig2 lxpanel metacity minixcal obconf openbox pupx rox_filer rox_filer twm wallpaper windowmaker xclipboard xclock xkbconfigmanager xlock_gui xlockmore "
var noPATTERN = " "+DB_nameonly+" "
if PKG_CAT_Desktop.contains(noPATTERN) do print "true"

'replace' function

This pretty much says it all:
init
var PKG_CAT_Desktop = " blackbox compiz desk_icon_theme_browndust desk_icon_theme_darkfire desk_icon_theme_original e16 fbpanel fluxbox fvwm gfontsel glipper gtk-chtheme gtk_theme_citrus_cut gtk_theme_fishing_the_sky gtk_theme_fishpie gtk_theme_gradient_brown gtk_theme_gradient_grey gtk_theme_m8darker gtk_theme_phacile_blue gtk_theme_polished_blue gtk_theme_stardust_zigbert gxset icewm jwm2 jwmconfig2 lxpanel metacity minixcal obconf openbox pupx rox_filer rox_filer twm wallpaper windowmaker xclipboard xclock xkbconfigmanager xlock_gui xlockmore "
var new_s = PKG_CAT_Desktop.replace(" compiz ", " yabbidoo ")
print "%s", new_s

'substring' function

This returns a substring at a certain offset in a string. In this example, the offset is 56 and the length of the substring is 23. The offset starts from zero, so actually the returned substring is from the 57th character:
init
var PKG_CAT_Desktop = " blackbox compiz desk_icon_theme_browndust desk_icon_theme_darkfire desk_icon_theme_original e16 fbpanel fluxbox fvwm gfontsel glipper gtk-chtheme gtk_theme_citrus_cut gtk_theme_fishing_the_sky gtk_theme_fishpie gtk_theme_gradient_brown gtk_theme_gradient_grey gtk_theme_m8darker gtk_theme_phacile_blue gtk_theme_polished_blue gtk_theme_stardust_zigbert gxset icewm jwm2 jwmconfig2 lxpanel metacity minixcal obconf openbox pupx rox_filer rox_filer twm wallpaper windowmaker xclipboard xclock xkbconfigmanager xlock_gui xlockmore "
var new_s = PKG_CAT_Desktop.substring(56,23)
print "%s", new_s







(c) Copyright 2008,2009 Barry Kauler puppylinux.com, all reproduction rights reserved.