Package gluon :: Module utf8

Module utf8


This file is part of the web2py Web Framework
Copyrighted by Massimo Di Pierro <mdipierro@cs.depaul.edu>
License: LGPLv3 (http://www.gnu.org/licenses/lgpl.html)

Created by Vladyslav Kozlovskyy (Ukraine) <dbdevelop©gmail.com>
       for Web2py project

Utilities and class for UTF8 strings managing
===========================================

Classes

[hide private]

Utf8
Class for utf8 string storing and manipulations

Functions

[hide private]

sort_key(s)
Unicode Collation Algorithm (UCA) (http://www.unicode.org/reports/tr10/) is used for utf-8 and unicode strings sorting and for utf-8 strings comparison

source code

ord(char)
returns unicode id for utf8 or unicode *char* character

source code

chr(code)
return utf8-character with *code* unicode id

source code

size(string)
return length of utf-8 string in bytes...

source code

truncate(string, length, dots='...')
returns string of length < *length* or truncate string with adding *dots* suffix to the string's end source code

Variables

[hide private]

repr_escape_tab = {1: u'\x01', 2: u'\x02', 3: u'\x03', 4: u'\x...

repr_escape_tab2 = {1: u'\x01', 2: u'\x02', 3: u'\x03', 4: u'\...

__package__ = 'gluon'

i = 31

Function Details

[hide private]

sort_key(s)

source code

Unicode Collation Algorithm (UCA) (http://www.unicode.org/reports/tr10/)
is used for utf-8 and unicode strings sorting and for utf-8 strings
comparison

NOTE: pyuca is a very memory cost module! It loads the whole
      "allkey.txt" file (~2mb!) into the memory. But this
      functionality is needed only when sort_key() is called as a
      part of sort() function or when Utf8 strings are compared.

So, it is a lazy "sort_key" function which (ONLY ONCE, ON ITS
FIRST CALL) imports pyuca and replaces itself with a real
sort_key() function

ord(char)

source code

returns unicode id for utf8 or unicode *char* character

SUPPOSE that *char* is an utf-8 or unicode character only

size(string)

source code

return length of utf-8 string in bytes
NOTE! The length of correspondent utf-8
      string is returned for unicode string

truncate(string, length, dots=`'...'`)

source code

returns string of length < *length* or truncate
    string with adding *dots* suffix to the string's end

args:
     length (int): max length of string
     dots (str or unicode): string suffix, when string is cutted

 returns:
     (utf8-str): original or cutted string

Variables Details

[hide private]

repr_escape_tab

Value:

{1: u'\x01',
 2: u'\x02',
 3: u'\x03',
 4: u'\x04',
 5: u'\x05',
 6: u'\x06',
 7: u'\a',
 8: u'\b',
...

repr_escape_tab2

Value:

{1: u'\x01',
 2: u'\x02',
 3: u'\x03',
 4: u'\x04',
 5: u'\x05',
 6: u'\x06',
 7: u'\a',
 8: u'\b',
...

Module utf8

sort_key(s)

ord(char)

size(string)

truncate(string, length, dots='...')

repr_escape_tab

repr_escape_tab2

truncate(string, length, dots=`'...'`)