Package gluon :: Module utf8
[hide private]
[frames] | no frames]

Module utf8

source code


This file is part of the web2py Web Framework
Copyrighted by Massimo Di Pierro <mdipierro@cs.depaul.edu>
License: LGPLv3 (http://www.gnu.org/licenses/lgpl.html)

Created by Vladyslav Kozlovskyy (Ukraine) <dbdevelop©gmail.com>
       for Web2py project

Utilities and class for UTF8 strings managing
===========================================

Classes [hide private]
  Utf8
Class for utf8 string storing and manipulations
Functions [hide private]
 
sort_key(s)
Unicode Collation Algorithm (UCA) (http://www.unicode.org/reports/tr10/) is used for utf-8 and unicode strings sorting and for utf-8 strings comparison
source code
 
ord(char)
returns unicode id for utf8 or unicode *char* character
source code
 
chr(code)
return utf8-character with *code* unicode id
source code
 
size(string)
return length of utf-8 string in bytes...
source code
 
truncate(string, length, dots='...')
returns string of length < *length* or truncate string with adding *dots* suffix to the string's end
source code
Variables [hide private]
  repr_escape_tab = {1: u'\x01', 2: u'\x02', 3: u'\x03', 4: u'\x...
  repr_escape_tab2 = {1: u'\x01', 2: u'\x02', 3: u'\x03', 4: u'\...
  __package__ = 'gluon'
  i = 31
Function Details [hide private]

sort_key(s)

source code 
Unicode Collation Algorithm (UCA) (http://www.unicode.org/reports/tr10/)
is used for utf-8 and unicode strings sorting and for utf-8 strings
comparison

NOTE: pyuca is a very memory cost module! It loads the whole
      "allkey.txt" file (~2mb!) into the memory. But this
      functionality is needed only when sort_key() is called as a
      part of sort() function or when Utf8 strings are compared.

So, it is a lazy "sort_key" function which (ONLY ONCE, ON ITS
FIRST CALL) imports pyuca and replaces itself with a real
sort_key() function

ord(char)

source code 

returns unicode id for utf8 or unicode *char* character

SUPPOSE that *char* is an utf-8 or unicode character only

size(string)

source code 
return length of utf-8 string in bytes
NOTE! The length of correspondent utf-8
      string is returned for unicode string

truncate(string, length, dots='...')

source code 
returns string of length < *length* or truncate
    string with adding *dots* suffix to the string's end

args:
     length (int): max length of string
     dots (str or unicode): string suffix, when string is cutted

 returns:
     (utf8-str): original or cutted string


Variables Details [hide private]

repr_escape_tab

Value:
{1: u'\x01',
 2: u'\x02',
 3: u'\x03',
 4: u'\x04',
 5: u'\x05',
 6: u'\x06',
 7: u'\a',
 8: u'\b',
...

repr_escape_tab2

Value:
{1: u'\x01',
 2: u'\x02',
 3: u'\x03',
 4: u'\x04',
 5: u'\x05',
 6: u'\x06',
 7: u'\a',
 8: u'\b',
...