Genie data types

Page updated October 27, 2009
Probably some more data types to be added to this page.


Those of us with a background in Fortran, C, BASIC etc., will know that when a variable is declared in the code, for example:
int x;
it will be instantiated, that is, created in memory at runtime. Of course, the duration in which it exists depends on various qualifiers -- like is it global (usable anywhere in the program) or local (usable only within a particular section of code, like inside a function).

What happens physically at runtime is that a location in memory is set aside to hold that variable value. This is a very clear concept: we have an address in memory, and the value stored at that address.

When we move onto object oriented languages, we have to get our minds around the idea of a variable as being somewhat more than just a simple memory location: it is an object. Actually, what is created physically in memory is a data-structure that the variable points to, with fields that contain various information, one of which may be the actual value of the variable, or a pointer to the value -- now, that might seem a backward step in terms of efficiency!

Actually, some object oriented languages have a mix, allowing basic variables to be created as non-objects, like our traditional languages. This is the case with Genie.

Another thing to consider is the concept of dynamic type handling, which Python has. What this means is that the type of the data, like integer or string, does not have to be declared in the source code and can dynamically change at runtime. This means of course that if such a variable has say a 32-bit integer value at one point in time, then a string at another point in time, then there is an extra level of code complexity.
Python is able to do this quite easily as it is an interpreted language, with a runtime virtual machine that executes the source code (or more correctly the semi-compiled bytecode).
Python is completely object oriented, even the basic data types like integers.

So, Genie is a hybrid, with the basic data types being non-object, and the rest being object types. There are two other classifications made here:  "value types" and "reference types", that are explained as we go along. Now to consider each of these broad types...

Value types

Here is a simple program written in Genie:
i = 3
i = i + 1
print "%d",i
Or, initialise the variable when declaring it:
i:int = 3
i = i + 1
print "%d",i
And this is the C code that the 'valac' compiler generates:
	gint i;
i = 3;
i = i + 1;
g_print ("%d\n", i);
Pretty straightforward, yes? There is not anything "object oriented" about that Genie code. Genie has created that simple integer variable as a traditional non-object variable, for maximum efficiency.

Genie always treats the basic scalar variable as simple "non-object" value type, that is, all of these:

bool, char, uchar, unichar, int, uint, long, ulong, float, double

So, "value type" simply means that there is no indirection. There is an address in memory, and it contains the value. Also, no objects in sight anywhere here!

A quick note on what these scalar value types mean, on a 32-bit Intel x86 CPU architcture:
true/false, theoretically only needs one bit
char, uchar
8-bit: -128 to +127, 0 to 255
Unicode character, usually 16-bit
int, uint
32-bit: -2,147,483,648 to +2,147,483,647, 0 to 4,294,967,295
long, ulong
64-bit: gigantic integers!

Also, for independence of the translation by any particular CPU architecture, the exact size can be specified:

int8 uint8 int16 uint16 int32 uint32 int64 uint64

The following are also technically value types:

struct, enum


The Vala/Genie docs classify the structure data type as a value type, however after much agonising I decided that the struct data type is really a special case and I have treated it separately at the bottom of this page.


This is really just a list of '#defines' as used in C. Each entry equates to a numeric value, 0, 1, 2, etc., so myenum.Tuesday is actually the numeric value 1.
enum myenum
Enums don't actually need any storage at runtime.

However, data types whose memory allocation may have to change at runtime, for example a string that may have text appended or removed, are better handled as reference types...

Simple reference types

A note about the term "reference types". I am using this in a more generic sense than is used in Vala and C++ documentation, as I am trying to make the explanation simpler. In other words, Vala docs define it more narrowly and have a whole lot more classifications, such as "pointer types" and "nullable types". Just so you know.
These "simple reference types" are data types for which there is a direct C equivalent. So, it is very easy to translate the code from Genie to C. These simple reference types include:

string, array


Here is some simple code:
s:string = "My name is Barry"
s = s + " Kauler"
print s
Looking at the generated C code:
	char* s;
char* _tmp0;
char* _tmp1;
s = g_strdup ("My name is Barry");
_tmp0 = NULL;
s = (_tmp0 = g_strconcat (s, " Kauler", NULL), (s = (g_free (s), NULL)), _tmp0);
_tmp1 = NULL;
g_print ((_tmp1 = g_strconcat (s, "\n", NULL)));
_tmp1 = (g_free (_tmp1), NULL);
s = (g_free (s), NULL);
The most important thing to notice here is that s is a pointer to the string (look at the first line), which is where the word "reference" comes in.

Those function calls to g_strdup() and g_strconcat() do end up calling the strdup() and strconcat() in the libc library. What is happening here is that the libglib library has those g_* functions, but it is a wrapper that calls the functions in the libc library.

Ahem, and for those of us with a very limited understanding of C syntax, that long line can be expanded as (thanks to my Puppy-friends bob and rarsa for explaining this to me):
	_tmp0 = g_strconcat (s, " Kauler", NULL);
g_free (s);
s = NULL;
s = _tmp0;
The rationale here is that as s is a pointer to the actual string, the string can easily be changed at runtime -- just create a new string in memory and change the pointer address.

Posix profile

A small digression...
I mentioned on the main Genie page that the Vala compiler has an option to generate pure C code that uses only the standard C shared libraries and does not require the intermediary Glib/Gobject libraries. Yes, I we can compile the above string example like this:
prompt> valac --profile=posix --ccode
Then the generated C code will have direct calls to strdup(), strconcat() etc.

But I also mentioned that some of the full power of Vala/Genie is lost. I plan to give examples in these Genie pages that explore this, however for now, a short statement by Jan Hudec on the Vala-list (email list):
> > If you know at least the standard POSIX API, you can focus on the 
> > posix.vapi.
> > It does not use anything of glib. It is actually possible to use vala
> > without glib altogether (using --profile=POSIX parameter), though many
> > features are not available than.
> Not even that is familiar at this point :-| But the --profile=POSIX
> intrigues me. What kind of things would not be available using this
> POSIX profile?

Well, mainly all the features only available to GLib.Object, which are:
- signals (delegates should still work, though, so you can still define
hooks manually)
- properties (those with get/set -- plain fields of course work)
- interfaces
Than some other things depend of glib/gobject infrastructure, which are:
- exceptions
- the new async stuff
- dbus binding

There is a plan for new support runtime called dova, that should provide most
of the features while being lighter than glib, but I don't think any
timeframe for it was announced.


A simple example:
a:array of string = {"abc", "def", "xyz"}
for s in a
print s
...note that I didn't declare the type of s. Yeah, well, it can't be anything other than a string, and the compiler does have some intelligence! If I really wanted to, this would also be accepted:
	for s:string in a
Well, there is no need to show the C code again. After all, that is something that is supposed to be of no concern to us. The whole idea of Genie is a easy-to-use language, to avoid the messy details of C coding.

But, once again, there is a line in the C code:
	char** a;
In which a is a pointer, hence again a "reference" type.

The most essential point about these string and array data types is that the pointer in the generated C code points to the actual data. This is the most efficient way to do it.

Still no objects anywhere to be seen!

Reference types as objects

Okay, so string and array (also struct, as explained further down) are basic C reference types, not object-oriented. However, Vala/Genie does provide OO wrapping for array, so you can declare that as an object...

So, redoing the array example, firstly declaring an uninitialised array (but it must have a size):
var a = new array of string[3]
a[0] = "abc"
a[1] = "def"
a[2] = "xyz"
The two important keywords used to create object a are var and new. It's quite readable:

var means that what follows is the declaration of variable a.
new means that an object is being instantiated, of class array.

Interesting that array can be declared in the old non-object way, or as an object.

There is a second group of reference types, that I have classified as "advanced" types, that have to be declared as objects. In this case, the pointer is to an object, which indirectly points to the actual data...

Advanced reference types

These data types contain much more sophisticated data formats than the C types. These must be declared using the new operator, which causes an object to be created. The advanced reference types include:

dict, list

As these advanced types are not supported in the standard C libraries, an extra library named libgee is required. This is a small shared library and must be installed in the system. Furthermore, the Vala compiler must be explicitly told to link it in on the commandline, for example:
# valac --pkg=gee-1.0
Libgee home page:

Describing each of these...


A dictionary is also known as a hash table, or an associative array. Here is a simple example:
var d = new dict of string,string
d["First name"]="Barry"
print "%s",d["First name"]
The concept is very simple, it's like a two-column table, with the first column being the "key" entries, the second being the "values". Unlike an array, whose elements you access by a numeric cell number, you access entries in a dictionary by the keys.


These are really the same thing as arrays, except there are more powerful functions to operate upon it and it can be dynamically resized at runtime. An example:
var l = new list of string
print l[0]
Perhaps the array should be considered as deprecated in favour of list? See the earlier example where an array object was declared -- it's size had to be declared also. Whereas, no size is needed when declaring the list object.

All of the reference data types have many functions, also known as methods, to operate upon them. The example above is the add() function. So, to really get productive using all these data types in Genie, you will need to know what functions are available, and how to use them...

Data functions

The documentation is online, and here are the two main places to look:

The functions are also described, in a cryptic format, locally if you have the Vala package installed, in "vapi" files. In particular, look at 'glib-2.0.vapi'. And, if you have libgee installed, look at 'gee-1.0.vapi'. Location:


String functions

The main places to look are:

Lots of functions there! Looking back at the string example I gave earlier in this page, note that I used the '+' operator to concatenate strings. However, there is also a string function named concat(). An example that uses some of these functions:
a:array of string = {"","","",""}
s:string = ""
s=s.concat("My name is Barry")
a=s.split(" ",4)
print("%s %s %s %s",a[0],a[1],a[2],a[3])
The split() function splits the string into an array of strings, and in this example the delimiter is a space character, and the maximum fields that can be split off is 4.

There needs to be another page that goes into depth about what string functions are available and how to use them. Here it is: Genie strings.

List and dict functions

The main places to look are:


Example application:
var l = new list of string
for s in l
The insert() function creates this list: {"abc", "stuff", "xyz"}, and the remove() function takes out the first field.

There needs to be another page with more complete descriptions of list and dict manipulations!
Here it is: Gee collection data types.

Type inference

Notice above that I used the var prefix when declaring variables as objects. Well, var has a more generic use also -- for type inference.

Actually, any variables can be declared with var and the specific data type declaration left off. For example, rewriting the very first example on this page:
var i = 3
i = i + 1
print("%d",i) //notice the optional brackets!
What this does is leave it up to the Vala compiler to figure out what the data type should be.

If you wish you can declare all the variables in your program with var. It is a bit Pythonish not having to declare data types, but this is still static data types as there can only be one data type assigned to a variable.

Type inference a "bad habit"
Python is a dynamically typed language, meaning not only do you not have to declare data types, the type of a variable can change at runtime. This may make coding "easier" but many consider it to be a "bad thing". The author of the Seed7 language is an example, and he also considers using type inference to be a "bad habit":

...having said that, var is very convenient and I use it frequently.

Structures, a special case

The struct data structure in Genie is exactly as you find in C. It is actually a reference type, but is a primitive C-level mechanism and Genie does not support instantiating it or any of its fields as an object. So, you cannot create it or any of its fields with the new keyword.

Boxed struct
However, Vala 0.5.4 introduced the boxed struct, which apparently does support structure objects. However I don't yet know anything about it -- anyone want to check this out?

The following example has a function, addon(), and if the code here is not clear to you then have a read of my page Genie functions. A simple example:
struct mydatastructure
a : string
i : int

def addon(ss:mydatastructure) : int
z:int = ss.i + 1
return z

s:mydatastructure = {"abc", 123}
zz:int = addon(s)
s.a += "def"
print("%s %d", s.a, zz)
A reference data type is passed by reference, meaning its address, into a function. So, you see where I instantiated the structure s, then passed it by reference into addon(). Inside addon() I am able to access the fields of the structure.

However, the various fields of the structure are value or reference depending on their data type. For example, although s overall is to be considered a reference type, s.i is actually a value type (integer) and s.a is a reference type (string).

Another limitation is that the fields of struct can only be the value types and the C reference types:

int, uint, char, uchar, unichar, bool, float, double, string, array, struct

Also the exact-size value types:

int8 uint8 int16 uint16 int32 uint32 int64 uint64

(c) Copyright 2008,2009 Barry Kauler, all reproduction rights reserved.