Loading TEXTEDIT.BBC

Discussions related to mouse, keyboard and Graphical User Interface
Post Reply
KenDown
Posts: 30
Joined: Wed 04 Apr 2018, 06:36

Loading TEXTEDIT.BBC

Post by KenDown » Tue 03 Jul 2018, 08:52

One of the excellent programs Richard has included in his "Examples" is TextEdit. I've learned a lot from it and found it very useful.

Welsh can have a circumflex over all its vowels, as for example, in the famous hymn

Calon Lân
Anghofio'i annwyl ŵyn

is a line from another hymn.

If you copy and paste those two lines into Notepad and save the file as UTF-8, you can load it in again and everything is fine. Start up TextEdit and load the file in and the result is "Calon Lân". I have changed to RichEdit20W and installed WINLIB5a instead of WINLIB5 and so on, but the load routine stubbornly refuses to recognise the UTF-8 encoding. Indeed, it goes so far as to add the three bytes which signal UTF-8 to the start of the text, which is not very pleasing.

Does anyone else have a solution to this, please?

Zaphod
Posts: 26
Joined: Sat 23 Jun 2018, 15:51

Re: Loading TEXTEDIT.BBC

Post by Zaphod » Tue 03 Jul 2018, 17:22

Would loading WINLIB5U and reading the manual section on Unicode help?

Z

KenDown
Posts: 30
Joined: Wed 04 Apr 2018, 06:36

Re: Loading TEXTEDIT.BBC

Post by KenDown » Tue 03 Jul 2018, 17:42

Curiously, no. Changing WINLIB5A to WINLIB5U does not result in the unicode characters being loaded correctly, but it *does* cause the program to hang-up when I close down. Not sure why.

Not sure what you mean by "the manual". The BB4W help section does not appear to have a section for unicode (other than skyting about the ability to display multilingual character sets).

Incidentally, if I read the file line by line and PROCtype it in, everything displays and functions perfectly. It's just the loading that causes problems.

Thanks.

Zaphod
Posts: 26
Joined: Sat 23 Jun 2018, 15:51

Re: Loading TEXTEDIT.BBC

Post by Zaphod » Tue 03 Jul 2018, 20:39

I was referring to the Help or online documentation.

WINLIB5U in Textedit.bbc works just as you would expect here.
Did you note that WINLIB5U like WINLIB5A has an additional parameter at the start? That should be the parent handle and put in @hwnd%. If you ignore that you get a 'type mismatch' error immediately.
Looking up the code of WINLIB5U you see it uses the CreateWindowExW function and that takes the parent handle which comes from this first parameter.
Perhaps that is the issue, use WINLIB5U and add the parent handle as the additional first parameter in the FN_createwindow call.
For me that allowed a load of a saved file correctly to display UTF-8 strings.


UPDATE. I can get what you see now I tried the actual welsh strings saved from Notepad with UTF-8 set.
WINLIB5U just shows the 'w' without the accent if saved and reloaded from TextEdit. The issue may be not using a font with the Celtic Encoding. Trying that string in other text editors warns me to find a font that has that accented w in it's encoding or that will be lost. So I think it might not be a BB4W issue so much as having a font that supports Welsh. If anyone has Welsh fonts installed they could prove that. Previously I had been testing on other languages that have encoding like Cyrillic where TextEdit does work. Sorry to mislead you.

Z
Last edited by Zaphod on Wed 04 Jul 2018, 22:06, edited 2 times in total.

Zaphod
Posts: 26
Joined: Sat 23 Jun 2018, 15:51

Re: Loading TEXTEDIT.BBC

Post by Zaphod » Tue 03 Jul 2018, 21:01

Some additional stuff.

When you were asking about Unicode on the previous forum there was a question about some of the functions not working on Unicode which I was unaware of. Anyhow, I guess it must have been raining as I wrote these library functions to fill in some of the gaps:

Code: Select all

      VDU 23,22,640;800;8,16,16,8
      *FONT Arial,16
      A$=" ΒΙΒΛΟΣ γενέσεως Ἰησοῦ Χριστοῦ "

      REM Test routines
      PRINT A$
      PRINT FN_ulen(A$)
      FOR i=1 TO 10
        PRINT FN_umid(A$,i,i);
        PRINT FN_ulen(FN_umid(A$,i,i))
      NEXT
      FOR i=1 TO 10
        PRINT FN_uleft(A$,i);
        PRINT FN_ulen(FN_uleft(A$,i))
      NEXT
      FOR i=1 TO 10
        PRINT FN_uright(A$,i);
        PRINT FN_ulen(FN_uright(A$,i))
      NEXT

      TIME=0
      FOR i=0 TO 100000
        b$=FN_uright(A$,3)
      NEXT
      PRINT TIME

      PRINT b$
      PRINT FN_ulen(b$)

      END

      REM Library functions

      DEF FN_umid(a$,st,num)
      REM UTF-8 MID$ replacement
      LOCAL S%, N%
      S%=FN_ucount(a$,0,st-1)+1
      N%=FN_ucount(a$,S%-1,num)-S%+1
      =MID$(a$,S%,N%)

      DEF FN_uleft(a$,num)
      REM UTF-8 LEFT$ replacement
      =LEFT$(a$,FN_ucount(a$,0,num))

      DEF FN_uright(a$,num):REM Search from right fastest with small selections.
      LOCAL A%, L%, I%, J%
      A%=!^a$
      FOR I% =LENa$-1 TO 0 STEP -1
        J%=A%?I%
        IF (J% AND &C0) <> &80 : L%+=1 :REM Anything not a contimuation is a start of char.
        IF L%=num EXIT FOR
      NEXT
      =RIGHT$(a$,LENa$-I%)

      DEF FN_ulen(a$)
      REM Finds Length in characters of UTF-8 string.
      LOCAL A%, L%, I%, J%
      A%=!^a$
      WHILE I%<LENa$
        J%=A%?I%
        CASE TRUE OF
            REM Character start bytes
          WHEN (J% AND &E0) = &C0 : L%+=1 : I%+=2
          WHEN (J% AND &80) = 0   : L%+=1 : I%+=1
          WHEN (J% AND &F0) = &E0 : L%+=1 : I%+=3
          WHEN (J% AND &F8) = &F0 : L%+=1 : I%+=4
          WHEN (J% AND &C0) = &80 : I%+=1 : REM Continuation byte. Should never execute!
        ENDCASE
      ENDWHILE
      =L%

      DEF FN_ucount(a$, I%, nchars)
      REM I% start of count in bytes. Returns total count in bytes adding nchars to start posn.
      LOCAL A%, L%, J%
      A%=!^a$                       :REM Address of start of string
      WHILE L%<=nchars-1 AND I%<LENa$
        J%=A%?I%                    :REM Get next byte
        CASE TRUE OF
            REM Compare byte with UTF-8 start bytes. Order is most likely found.
          WHEN (J% AND &E0) = &C0 : L%+=1 : I%+=2
          WHEN (J% AND &80) = 0   : L%+=1 : I%+=1
          WHEN (J% AND &F0) = &E0 : L%+=1 : I%+=3
          WHEN (J% AND &F8) = &F0 : L%+=1 : I%+=4
          WHEN (J% AND &C0) = &80 : I%+=1 : REM Continuation byte. Should never execute!
        ENDCASE
      ENDWHILE
      =I% :REM bytes used for nchars
This might be of use as you seem to need Unicode.

Z

Post Reply