Chapter 6 - String Data Types
A string is a list of text characters. We tell BASIC that we are
dealing with text rather than variable names by enclosing the text in
double quotation marks. Applying this, you should be able to see why,
if we want to write 'Hello' on the screen we use:
The second example would send BASIC scurrying off to its variable list
trying to find one called Hello.
If it just so happened that you had
one, it will print its value, most likely you won't so BASIC will
We can think of the way a string variable holds its value as a series
of memory locations, each of which holds a character, like this:
The quotes are not kept as part of the text, they are just used as
delimiters during programming. Each position is one byte in size, this
means it can hold a number in the range of 0-255. So, if a byte can
only hold a number, how does it store letters like above? The answer is
that the operating system has a lookup table which it uses to translate
your text into numeric codes for storing in memory and then translate
them back again when we want to print them out. The table is called the
ASCII table (American Standard Code for Information Interchange, programmers
love acronyms) and it provides a table of corresponding letters,
numbers, punctuation marks and other assorted characters. When we store
the letters for "Hello", it actually represents them internally like
You can see the full table in the online help under Reference
Information. Note that not all the codes have a visible representation
and codes less than 32 may cause strange things to happen if you try to
print them. These lower codes are often referred to as control
characters, they represent things like horizontal tab, form feed etc.
Also, the numbers above 127 are a non-standard standard (!) and so will
give different characters depending on the font that is being used.
Look at character 32, space. Space is normally filtered out by our
brains when we read text, it's there but we ignore it. To a computer,
space still needs a representation and so is given a value, just like
any of the other punctuation marks such as comma (code 44) or decimal
point / full stop (code 46). As BASIC is so picky about spaces, this
means that the two strings:
would be considered different, as stated we tend to filter it out,
but to the computer it's just another character code.
As numbers have limits, so too do strings. The limits are
the code of the character (0 to 255) and the number of characters the
string variable can hold, or the length of the string. In BB4W the
length can range between 0 and 65535 characters. Zero because you can
have a string with nothing in it. In fact when you first declare a
string variable, BASIC creates it with zero length, i.e. with nothing
in it. This may seem a little odd but is a useful concept. If at any
time you wish to set a string to hold nothing, this is how you do it:
That's double quotes with no gap between them. As a space is a
character, this is not the same as:
If you were to print them out, you would not see any difference, but
that, of course, doesn't mean they are the same. A string with nothing
in it is variously called an empty string or null string. You'll come
Now we've got to grips with what a string is, what can we do with them?
With numeric variables, you can add, subtract, square root etc. Strings
are a little more limited. You can only add them:
This is called concatenation, which is a fancy word meaning chain them
together. The program copies the contents of S1$, splices S2$ onto the
end of it and puts the result into S4$.
Line 7 adds S3$
onto the end of
would be nothing to stop you doing all this in one line.
= ", "|
= S1$ + S2$|
= S4$ + S3$|
This is the only mathematical operation that is allowed on strings.
None of the others make much sense anyway: how do you find the square
root of "Hello"? Don't for one moment think that's it, though, BASIC
has a very comprehensive set of functions for manipulating string
variables. These are dealt with in the following section.
= ", "|
= S1$ + S2$ + S3$|
One of the most useful things we can know about a string is its length.
The function LEN tells us exactly this. It must always have one
argument, though as is usual, this can be an expression. The result
must always be assigned to a numeric value or used in an expression
where a numeric value is expected. In immediate mode, try the following:
LEN can be used to distinguish between empty strings and strings with
no visible characters:
|REM LEN of an empty string|
There are times when you want to be able to generate a repeating
pattern of text without typing it all in manually. STRING$ does just
this. It takes as its parameters a number of repetitions and a base
string. It returns a string which is the base string repeated the given
number of times:
Here is a little program that will take a string then underline it.
Or, just to get carried away, we could put the title in a box:
Underline using LEN and STRING$|
Title$ = "BBC BASIC"|
using LEN and STRING$|
Title$ = "BBC BASIC"|
PRINT "* ";Title$;" *"|
Although it's easy to create strings, there are times when we want to
inspect their contents. The function INSTR allows us to search a string
for a character or pattern of characters. INSTR takes two or three
arguments. The first is the string we wish to search. The second is a
string containing the characters we wish to search for. The third is
optional, we'll get to it in a minute. When supplied with two
parameters, INSTR will return the position of the first character in
the search string that matches the characters in the list to
search for. This example will return the position of the first letter C
in the target string:
The first character in a string is position 1. If INSTR returns 0, it
means no match was found.
The optional third parameter can force INSTR to start at a position
other than 1. This means we can search the entire string by remembering
the last position returned and starting one character after that.
INSTR("BBC BASIC", "C")|
Notice how we have to increment Posn%
to get it past the first C. If
we hadn't, we would have started from position 3 again. As position 3
is a C, the search would have returned the same value again.
start position is larger than the length of the string, you get 0 (not
found) in return.
|REM INSTR Demo|
PRINT "C found in position: ";Posn%|
PRINT "C found in position: ";Posn%|
INSTR can also search for a sequence of characters in the target string.
The thing to be wary of here is how you specify the string to search for
Will both tell you that both contain the word "FOR", when clearly the
second one doesn't. This again is because BASIC has no concept of
language, it just looks for a pattern of characters and when it finds a
match, stops. A more correct way would be to search for:
|PRINT INSTR("BBC BASIC FOR
LEFT$ and RIGHT$
|PRINT INSTR("BBC BASIC FOR
The next two functions return a subsection of a string and are dealt
with together as they are functionally similar.
LEFT$ takes two parameters: a target string and a number of characters.
It returns a string which is the number of characters in length
starting from position 1.
If the number is greater than the total length of the string, you just
get the whole string.
LEFT$("Hello, world", 5)|
LEFT$("Hello, world", 100)|
LEFT$ will also accept one parameter only:
This will return all the characters but the last one and is the same as:
It is also possible to use LEFT$ as an assignment. In this mode, LEFT$
will overwrite the characters in the string with the ones being
assigned, starting at the first character.
|PRINT LEFT$("Hello", LEN("Hello")-1)|
If you specify a number less
than the length of the replacement, BASIC will only overwrite the
number of characters specified. Should you specify more, BASIC will
only overwrite up to the maximum characters in the replacement string.
as an assignment|
RIGHT$ takes the same arguments as LEFT$ but returns the rightmost
number of characters.
Again, if the number is too big, you just get the whole string back.
With only one argument, RIGHT$ will return just the last character.
RIGHT$("Hello, world", 5)|
Predictably, when used in an assignment, RIGHT$ will overwrite the
characters at the end of the string.
Exactly what happens if you specify fewer characters than the length of
the replacement string is probably best illustrated by example. Change
the 5 to 4 in line 3 above and see what happens. It starts 4 characters
from the end of the string and copies the first 4 characters from the
replacement string. If you tell BASIC to use more characters than are
contained in the string, our friendly computer will effectively derive
its own number. Substitute 8 in line 3 and see. The replacement doesn't
start 8 characters away from the end of the string, it merely works out
that the replacement has 5 characters, and starts at that position
as an assignment|
Please note that with both LEFT$ and RIGHT$, you cannot lengthen the
original string by giving more characters in the replacement than are
in the target. BASIC will just truncate the substitute string at the
length of the
LEFT$ and RIGHT$ allow us to manipulate the start and end of a string,
but what happens if you want to extract from the middle? MID$ will do
this for us.
In its more common application, MID$ has three parameters: a string,
the start position and a number of characters. As with all strings, the
left most character is position 1. Try this:
This returns 7 characters starting at position 9 i.e. "favours" in this
MID$("Fortune favours the bold",
If the last number is bigger than the length of the string, you
just get everything up to the end.
This case is so common that BASIC allows us to omit the final
parameter. If you do this, BASIC assumes that you want all the
characters from the start position to the end.
MID$("Fortune favours the bold",
OK, that was painless enough, but we're not finished. Like RIGHT$ and
LEFT$, MID$ can also be used on the other side of the equals sign. This
means that you can get BASIC to replace a section of a string:
MID$("Fortune favours the bold", 9)|
From the above description, you should be able to guess what it's
doing. For completeness: line 3 takes the string "strength" which is 8
characters long and, starting at position 9 in A$, replaces the
characters one for one with the characters in "strength".
A$ = "Give me patience!!"|
There are several things to be aware of when dealing with the number of
characters. Usually, the number is the same as the length
of the replacement string. If the number of characters specified is
shorter than the length of the replacement, only that number
of characters are copied:
Also, if the start position in the target string plus the number of
characters is greater than the total length of the replacement string,
BASIC will only copy characters up to the end of the target string and
ignore anything after:
To put it another way, BASIC will not extend the length of the target
= "all your cash"|
You can leave out the number of characters. In this case BASIC assumes
the length of the replacement string, but still obeys the rules given
Now for a little demo that uses INSTR, LEFT$ and MID$. Suppose we
have someone's full name and we want to separate it into first name and
surname. We know that the two names are separated by a space, so first
we use INSTR to locate the space. Then we copy all the characters up
to, but not including, the space into the string that keeps the first
name. Next we take all the letters starting after the space up to the
end and save them in the surname. Have a crack at this yourself first
before looking at my result if you want to, it's the only way to learn.
How did you do? There are always as many ways to code the solution to a
program as there are people trying to code it, so if you got a
different solution that's fine. Also don't be upset if you didn't get
it completely right first go, I didn't: it's all part of the
|REM Separate names|
FullName$ = "Joe
Posn% = INSTR(FullName$,
FirstName$ = LEFT$(FullName$,
Surname$ = MID$(FullName$,
PRINT "Your first name is: ";FirstName$|
PRINT "Your surname is: ";Surname$|
ASC and CHR$
We have already made the acquaintance of the ASCII table. It is very
useful to be able to find the codes that correspond to the letters and
vice versa. That's the job of ASC and CHR$.
ASC returns an integer which is the ASCII code for the character passed
as a parameter:
Gives 65, as expected.
Gives 49, which is the code for the character "1", NOT the value 1.
If the string is bigger than one character, ASC just returns the code
for the first character. To inspect other positions, we need to use
which gives the code for the second character, "B".
As you may expect, CHR$ does the reverse of ASC: give it a number and
it will return a single character string containing the corresponding
CHR$ is particularly useful for making strings out of the characters
you can't get on the standard keyboard:
This can be a useful technique for printing cursor control characters
or user defined characters, which are described in a later section.
"The temperature is
If you give CHR$ a number which is bigger than 256, BASIC divides it by 256
and gives the character corresponding to the remainder.
|Tip: Printing quotation marks
|If you want to print a double quote in a string, you
can do it in two
ways, the first one involves building a string using CHR$(34), which is
the code for double quote.
The other way is a little trick that BB4W allows us. You can
actually put the quote in the string, but you use two double quotes
together so BASIC knows that we want to print the quote character and
not end the string.
Greeting$ = CHR$(34) + "Hello, world"
As the quotes in this string are at the beginning and end, there are
three lots, which definitely looks odd. Take the beginning, the first
indicates the start of the string and the next two tell BASIC to store
a quote. The end is the same but in reverse.
Greeting$ = """Hello, world"""|
VAL and STR$
The next two commands allow us to convert between numeric and string
VAL takes as its argument a string representation of a number and
returns the numeric equivalent of that number.
If the string contains non-numeric information, it will convert until
or if the non-numeric stuff comes first, you just get 0 back.
The counterpart of VAL is STR$, which you probably guessed. You might
also have guessed that this takes a number or numeric variable and
converts it into a string representation. Now we can add a number to a
There are default settings which control the format of the string
produced. This is well documented in the online help and is changeable
at runtime if you require, but is a little beyond the scope of
A$ = "The temperature outside is
" + STR$(21.6)|
The last string command that must be mentioned is EVAL. I'll give a
flavour of what it can do rather than a full description because it is
powerful command. In essence, it allows you to evaluate the contents of
a string expression. Take the description of VAL, which
converts a string to a number. At some point programmers try,
inadvertently or otherwise, something like this:
VAL returns 22 as described above. Now try:
Not impressed? Try:
Take it from me, that's not something you get with any old BASIC. You
can pass any string expression and EVAL will evaluate it and return a
numeric or string value, just as if you had entered the code into a
line of a program. As demonstrated above, you can use internal BBC
BASIC functions (though commands like CLS etc. will not work). You can
even use variables within the program:
The possibilities that this presents spiral off into infinity, so
that's all I'm going to say about it here.
Side1 = 3|
Side2 = 4|
Hyp = EVAL("SQR(Side1^2+Side2^2)")|
PRINT "Hypotenuse is: ";Hyp|
1) Set a string to hold the days of the week like this:
All names are 4 characters in length including a space if necessary. Given a number for a
day, use MID$ to extract the correct abbreviation for the day.
|"Sun Mon TuesWed ThurFri Sat"
2) Set a string to hold your first name. Use MID$ and ASC to find the
ASCII codes of the letters in the name.
3) Set three strings to hold your first name, second name (if you
haven't got one, make it up) and surname. Use LEFT$ to find your
initials and concatenation to create a new string in the format "R. T.
© Peter Nairn 2006