Using Unicode with Vector Pascal

 

Paul Cockshott

 

ISO Pascal is defined using an alphabet of symbols all of which can be represented with ASCII. Vector Pascal uses Unicode to permit a wider range of symbols to be used in programs.

 

Programs should be submitted to the compiler in UTF-8 encoded Unicode. Since the 7 bit ASCII is a subset of UTF-8, all valid ASCII encoded Vector Pascal programs are also valid UTF-8 programs.

 

Letter based Identifiers

 

ISO Pascal allows the Latin letters A-Z to be used in identifiers. Vector Pascal extends this by allowing letters from the Greek, Cyrillic, Katakana and Hiragana character sets.

Alphabet

Position

glyph

code

Greek

low

Α

0391

 

high

Ω

03a9

Cyrillic

low

А

0410

 

high

Я

042f

Katakana

low

30a0

 

high

30fa

Hiragana

low

3041

 

high

0394

 

 

Treatment of identifiers is case indifferent, in that upper case and lower case versions of a given letter are treated as equivalent. Thus the identifier δ may al so be written Δ. Identifiers drawn from these alphabets can be strings of letters or digits starting with a letter.

Ideographic identifiers

Vector Pascal allows the use of Ideographs drawn from the unified Chinese, Japanese and Korean sets (Unicode range 4e00-9fff) to act as identifiers.

 

Special Symbols

When using Unicode certain mathematical operations that are encoded as a sequence of ASCII characters can be represented as a single Unicode character.

 

 

 

Operation

ASCII form

Extended form

Unicode

Set membership

in

2208

Assignment

:=

2190

Integer division

div

00f7

Nary summation

\+

2211

Nary product

\*

220f

Square root

sqrt

221a

Less than or equal

<=

2264

Greater than or equal

>=

2265

Not equal

<>

2260

Negation

not

00ac

Logical and

and

2227

Logical or

or

2228

Multiplication

*

2715

Index generation

iota

 

2373

 

Example

The following shows the use of Unicode operators in place of the Ascii ones used on previous releases of Vector Pascal.

 

Program proddemo;

{ prints the product and square root of the integers 1..5 }

Var a:array[1..5] of Integer; Y:integer;

Begin

{ unicode version}

a ⍳ 0; { form integers from 1 to 5 }

Writeln(a);

Y ←∏ a; { get their product }

Writeln( y, y);

{ now using ascii }

a:= iota 0;

writeln(a);

y:= \ * a;

writeln (y, sqrt(y));

End.

 

Characters

The built in char type in Pascal is represented with 16 bits in Vector Pascal. This allows any Unicode character to be handled.

 

Strings

Strings are held as arrays of char with a length word held in the first character. This is a simple extension of the mechanism used in Turbo Pascal. It potentially allows strings to be up to 65535 characters long. The type STRING written without a length specification stands for a string of length MAXSTRING.

 

Read

When reading strings or characters from a text file, conversion is automatically performed from utf-8 to Unicode format.

 

Write

When characters or strings are output to a text file, they are converted from the internal Unicode format to the utf-8 format.

 

Example

Here is an example program to print out a page of the Unicode character set. It illustrates the use of Unicode for variable names, Unicode within strings, and the manipulation of 16 bit characters.

 

 

Program printUnicode;

Procedure printpage(p:integer);

Var σ:string; { a Greek variable name}

c:char;

I,j:integer;

Begin

Writeln(p);

For I:=0 to 15 do

Begin

σ :=- - ; { Unicode arrow symbol in string}

For j:=0 to 15 do

{ concatenate a 16 bit char onto a string }

σ:=σ+ chr(j+ 16*I + 256* p);

Writeln( σ);

End;

End;

Var page:integer;

Begin

Write( Unicode page:);

Readln(page);

Printpage(page);

End.

 

 

 

 

post 16.1.06 hits
View My Stats