Package gluon :: Module utf8 :: Class Utf8
[hide private]
[frames] | no frames]

Class Utf8

source code

object --+        
         |        
basestring --+    
             |    
           str --+
                 |
                Utf8

Class for utf8 string storing and manipulations

The base presupposition of this class usage is: "ALL strings in the application are either of utf-8 or unicode type, even when simple str type is used. UTF-8 is only a "packed" version of unicode, so Utf-8 and unicode strings are interchangeable."

CAUTION! This class is slower than str/unicode! Do NOT use it inside intensive loops. Simply decode string(s) to unicode before loop and encode it back to utf-8 string(s) after intensive calculation.

You can see the benefit of this class in doctests() below

Instance Methods [hide private]
 
__repr__(self)
# note that we use raw strings to avoid having to use double back slashes below NOTE! This function is a clone of web2py:gluon.languages.utf_repl() function
source code
 
__size__(self)
length of utf-8 string in bytes
source code
 
__contains__(self, other)
y in x
source code
 
__getitem__(self, index)
x[y]
source code
 
__getslice__(self, begin, end)
x[i:j]
source code
 
__add__(self, other)
x+y
source code
 
__len__(self)
len(x)
source code
 
__mul__(self, integer)
x*n
source code
 
__eq__(self, string)
x==y
source code
 
__ne__(self, string)
x!=y
source code
string
capitalize(self)
Return a copy of the string S with only its first character capitalized.
source code
string
center(self, length)
Return S centered in a string of length width.
source code
string
upper(self)
Return a copy of the string S converted to uppercase.
source code
string
lower(self)
Return a copy of the string S converted to lowercase.
source code
string
title(self)
Return a titlecased version of S, i.e.
source code
int
index(self, string)
Like S.find() but raise ValueError when the substring is not found.
source code
bool
isalnum(self)
Return True if all characters in S are alphanumeric and there is at least one character in S, False otherwise.
source code
bool
isalpha(self)
Return True if all characters in S are alphabetic and there is at least one character in S, False otherwise.
source code
bool
isdigit(self)
Return True if all characters in S are digits and there is at least one character in S, False otherwise.
source code
bool
islower(self)
Return True if all cased characters in S are lowercase and there is at least one cased character in S, False otherwise.
source code
bool
isspace(self)
Return True if all characters in S are whitespace and there is at least one character in S, False otherwise.
source code
bool
istitle(self)
Return True if S is a titlecased string and there is at least one character in S, i.e.
source code
bool
isupper(self)
Return True if all cased characters in S are uppercase and there is at least one cased character in S, False otherwise.
source code
string
zfill(self, length)
Pad a numeric string S with zeros on the left, to fill a field of the specified width.
source code
string
join(self, iter)
Return a string which is the concatenation of the strings in the iterable.
source code
string or unicode
lstrip(self, chars=None)
Return a copy of the string S with leading whitespace removed.
source code
string or unicode
rstrip(self, chars=None)
Return a copy of the string S with trailing whitespace removed.
source code
string or unicode
strip(self, chars=None)
Return a copy of the string S with leading and trailing whitespace removed.
source code
string
swapcase(self)
Return a copy of the string S with uppercase characters converted to lowercase and vice versa.
source code
int
count(self, sub, start=0, end=None)
Return the number of non-overlapping occurrences of substring sub in string S[start:end].
source code
object
decode(self, encoding='utf-8', errors='strict')
Decodes S using the codec registered for encoding.
source code
object
encode(self, encoding, errors='strict')
Encodes S using the codec registered for encoding.
source code
string
expandtabs(self, tabsize=8)
Return a copy of S where all tab characters are expanded using spaces.
source code
int
find(self, sub, start=None, end=None)
Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].
source code
string
ljust(self, width, fillchar=' ')
Return S left-justified in a string of length width.
source code
(head, sep, tail)
partition(self, sep)
Search for the separator sep in S, and return the part before it, the separator itself, and the part after it.
source code
string
replace(self, old, new, count=-1)
Return a copy of string S with all occurrences of substring old replaced by new.
source code
int
rfind(self, sub, start=None, end=None)
Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].
source code
int
rindex(self, string)
Like S.rfind() but raise ValueError when the substring is not found.
source code
string
rjust(self, width, fillchar=' ')
Return S right-justified in a string of length width.
source code
(head, sep, tail)
rpartition(self, sep)
Search for the separator sep in S, starting at the end of S, and return the part before it, the separator itself, and the part after it.
source code
list of strings
rsplit(self, sep=None, maxsplit=-1)
Return a list of the words in the string S, using sep as the delimiter string, starting at the end of the string and working to the front.
source code
list of strings
split(self, sep=None, maxsplit=-1)
Return a list of the words in the string S, using sep as the delimiter string.
source code
list of strings
splitlines(self, keepends=False)
Return a list of the lines in S, breaking at line boundaries.
source code
bool
startswith(self, prefix, start=0, end=None)
Return True if S starts with the specified prefix, False otherwise.
source code
string
translate(self, table, deletechars='')
Return a copy of the string S, where all characters occurring in the optional argument deletechars are removed, and the remaining characters have been mapped through the given translation table, which must be a string of length 256 or None.
source code
bool
endswith(self, prefix, start=0, end=None)
Return True if S ends with the specified suffix, False otherwise.
source code
string
format(self, *args, **kwargs)
Return a formatted version of S, using substitutions from args and kwargs.
source code
 
__mod__(self, right)
x%y
source code
 
__ge__(self, string)
x>=y
source code
 
__gt__(self, string)
x>y
source code
 
__le__(self, string)
x<=y
source code
 
__lt__(self, string)
x<y
source code

Inherited from str: __format__, __getattribute__, __getnewargs__, __hash__, __rmod__, __rmul__, __sizeof__, __str__

Inherited from str (private): _formatter_field_name_split, _formatter_parser

Inherited from object: __delattr__, __init__, __reduce__, __reduce_ex__, __setattr__, __subclasshook__

Static Methods [hide private]
a new object with type S, a subtype of T
__new__(cls, content='', codepage='utf-8') source code
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__new__(cls, content='', codepage='utf-8')
Static Method

source code 
Returns: a new object with type S, a subtype of T
Overrides: object.__new__
(inherited documentation)

__repr__(self)
(Representation operator)

source code 

# note that we use raw strings to avoid having to use double back slashes below NOTE! This function is a clone of web2py:gluon.languages.utf_repl() function

utf8.__repr__() works same as str.repr() when processing ascii string >>> repr(Utf8('abc')) == repr(Utf8("abc")) == repr('abc') == repr("abc") == "'abc'" True >>> repr(Utf8('a"b"c')) == repr('a"b"c') == '\'a"b"c\'' True >>> repr(Utf8("a'b'c")) == repr("a'b'c") == '"a\'b\'c"' True >>> repr(Utf8('a\'b"c')) == repr('a\'b"c') == repr(Utf8("a'b\"c")) == repr("a'b\"c") == '\'a\\\'b"c\'' True >>> repr(Utf8('a\r\nb')) == repr('a\r\nb') == "'a\\r\\nb'" # Test for \r, \n True

Unlike str.repr(), Utf8.__repr__() remains utf8 content when processing utf8 string >>> repr(Utf8('中文字')) == repr(Utf8("中文字")) == "'中文字'" != repr('中文字') True >>> repr(Utf8('中"文"字')) == "'中\"文\"字'" != repr('中"文"字') True >>> repr(Utf8("中'文'字")) == '"中\'文\'字"' != repr("中'文'字") True >>> repr(Utf8('中\'文"字')) == repr(Utf8("中'文\"字")) == '\'中\\\'文"字\'' != repr('中\'文"字') == repr("中'文\"字") True >>> repr(Utf8('中\r\n文')) == "'中\\r\\n文'" != repr('中\r\n文') # Test for \r, \n True

Overrides: object.__repr__

__contains__(self, other)
(In operator)

source code 

y in x

Overrides: str.__contains__
(inherited documentation)

__getitem__(self, index)
(Indexing operator)

source code 

x[y]

Overrides: str.__getitem__
(inherited documentation)

__getslice__(self, begin, end)
(Slicling operator)

source code 

x[i:j]

Use of negative indices is not supported.

Overrides: str.__getslice__
(inherited documentation)

__add__(self, other)
(Addition operator)

source code 

x+y

Overrides: str.__add__
(inherited documentation)

__len__(self)
(Length operator)

source code 

len(x)

Overrides: str.__len__
(inherited documentation)

__mul__(self, integer)

source code 

x*n

Overrides: str.__mul__
(inherited documentation)

__eq__(self, string)
(Equality operator)

source code 

x==y

Overrides: str.__eq__
(inherited documentation)

__ne__(self, string)

source code 

x!=y

Overrides: str.__ne__
(inherited documentation)

capitalize(self)

source code 

Return a copy of the string S with only its first character capitalized.

Returns: string
Overrides: str.capitalize
(inherited documentation)

center(self, length)

source code 

Return S centered in a string of length width. Padding is done using the specified fill character (default is a space)

Returns: string
Overrides: str.center
(inherited documentation)

upper(self)

source code 

Return a copy of the string S converted to uppercase.

Returns: string
Overrides: str.upper
(inherited documentation)

lower(self)

source code 

Return a copy of the string S converted to lowercase.

Returns: string
Overrides: str.lower
(inherited documentation)

title(self)

source code 

Return a titlecased version of S, i.e. words start with uppercase characters, all remaining cased characters have lowercase.

Returns: string
Overrides: str.title
(inherited documentation)

index(self, string)

source code 

Like S.find() but raise ValueError when the substring is not found.

Returns: int
Overrides: str.index
(inherited documentation)

isalnum(self)

source code 

Return True if all characters in S are alphanumeric and there is at least one character in S, False otherwise.

Returns: bool
Overrides: str.isalnum
(inherited documentation)

isalpha(self)

source code 

Return True if all characters in S are alphabetic and there is at least one character in S, False otherwise.

Returns: bool
Overrides: str.isalpha
(inherited documentation)

isdigit(self)

source code 

Return True if all characters in S are digits and there is at least one character in S, False otherwise.

Returns: bool
Overrides: str.isdigit
(inherited documentation)

islower(self)

source code 

Return True if all cased characters in S are lowercase and there is at least one cased character in S, False otherwise.

Returns: bool
Overrides: str.islower
(inherited documentation)

isspace(self)

source code 

Return True if all characters in S are whitespace and there is at least one character in S, False otherwise.

Returns: bool
Overrides: str.isspace
(inherited documentation)

istitle(self)

source code 

Return True if S is a titlecased string and there is at least one character in S, i.e. uppercase characters may only follow uncased characters and lowercase characters only cased ones. Return False otherwise.

Returns: bool
Overrides: str.istitle
(inherited documentation)

isupper(self)

source code 

Return True if all cased characters in S are uppercase and there is at least one cased character in S, False otherwise.

Returns: bool
Overrides: str.isupper
(inherited documentation)

zfill(self, length)

source code 

Pad a numeric string S with zeros on the left, to fill a field of the specified width. The string S is never truncated.

Returns: string
Overrides: str.zfill
(inherited documentation)

join(self, iter)

source code 

Return a string which is the concatenation of the strings in the iterable. The separator between elements is S.

Returns: string
Overrides: str.join
(inherited documentation)

lstrip(self, chars=None)

source code 

Return a copy of the string S with leading whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is unicode, S will be converted to unicode before stripping

Returns: string or unicode
Overrides: str.lstrip
(inherited documentation)

rstrip(self, chars=None)

source code 

Return a copy of the string S with trailing whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is unicode, S will be converted to unicode before stripping

Returns: string or unicode
Overrides: str.rstrip
(inherited documentation)

strip(self, chars=None)

source code 

Return a copy of the string S with leading and trailing whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is unicode, S will be converted to unicode before stripping

Returns: string or unicode
Overrides: str.strip
(inherited documentation)

swapcase(self)

source code 

Return a copy of the string S with uppercase characters converted to lowercase and vice versa.

Returns: string
Overrides: str.swapcase
(inherited documentation)

count(self, sub, start=0, end=None)

source code 

Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Returns: int
Overrides: str.count
(inherited documentation)

decode(self, encoding='utf-8', errors='strict')

source code 

Decodes S using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is 'strict' meaning that encoding errors raise a UnicodeDecodeError. Other possible values are 'ignore' and 'replace' as well as any other name registered with codecs.register_error that is able to handle UnicodeDecodeErrors.

Returns: object
Overrides: str.decode
(inherited documentation)

encode(self, encoding, errors='strict')

source code 

Encodes S using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is 'strict' meaning that encoding errors raise a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and 'xmlcharrefreplace' as well as any other name registered with codecs.register_error that is able to handle UnicodeEncodeErrors.

Returns: object
Overrides: str.encode
(inherited documentation)

expandtabs(self, tabsize=8)

source code 

Return a copy of S where all tab characters are expanded using spaces. If tabsize is not given, a tab size of 8 characters is assumed.

Returns: string
Overrides: str.expandtabs
(inherited documentation)

find(self, sub, start=None, end=None)

source code 

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

Returns: int
Overrides: str.find
(inherited documentation)

ljust(self, width, fillchar=' ')

source code 

Return S left-justified in a string of length width. Padding is done using the specified fill character (default is a space).

Returns: string
Overrides: str.ljust
(inherited documentation)

partition(self, sep)

source code 

Search for the separator sep in S, and return the part before it, the separator itself, and the part after it. If the separator is not found, return S and two empty strings.

Returns: (head, sep, tail)
Overrides: str.partition
(inherited documentation)

replace(self, old, new, count=-1)

source code 

Return a copy of string S with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

Returns: string
Overrides: str.replace
(inherited documentation)

rfind(self, sub, start=None, end=None)

source code 

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

Returns: int
Overrides: str.rfind
(inherited documentation)

rindex(self, string)

source code 

Like S.rfind() but raise ValueError when the substring is not found.

Returns: int
Overrides: str.rindex
(inherited documentation)

rjust(self, width, fillchar=' ')

source code 

Return S right-justified in a string of length width. Padding is done using the specified fill character (default is a space)

Returns: string
Overrides: str.rjust
(inherited documentation)

rpartition(self, sep)

source code 

Search for the separator sep in S, starting at the end of S, and return the part before it, the separator itself, and the part after it. If the separator is not found, return two empty strings and S.

Returns: (head, sep, tail)
Overrides: str.rpartition
(inherited documentation)

rsplit(self, sep=None, maxsplit=-1)

source code 

Return a list of the words in the string S, using sep as the delimiter string, starting at the end of the string and working to the front. If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator.

Returns: list of strings
Overrides: str.rsplit
(inherited documentation)

split(self, sep=None, maxsplit=-1)

source code 

Return a list of the words in the string S, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator and empty strings are removed from the result.

Returns: list of strings
Overrides: str.split
(inherited documentation)

splitlines(self, keepends=False)

source code 

Return a list of the lines in S, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.

Returns: list of strings
Overrides: str.splitlines
(inherited documentation)

startswith(self, prefix, start=0, end=None)

source code 

Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.

Returns: bool
Overrides: str.startswith
(inherited documentation)

translate(self, table, deletechars='')

source code 

Return a copy of the string S, where all characters occurring in the optional argument deletechars are removed, and the remaining characters have been mapped through the given translation table, which must be a string of length 256 or None. If the table argument is None, no translation is applied and the operation simply removes the characters in deletechars.

Returns: string
Overrides: str.translate
(inherited documentation)

endswith(self, prefix, start=0, end=None)

source code 

Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.

Returns: bool
Overrides: str.endswith
(inherited documentation)

format(self, *args, **kwargs)

source code 

Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces ('{' and '}').

Returns: string
Overrides: str.format
(inherited documentation)

__mod__(self, right)

source code 

x%y

Overrides: str.__mod__
(inherited documentation)

__ge__(self, string)
(Greater-than-or-equals operator)

source code 

x>=y

Overrides: str.__ge__
(inherited documentation)

__gt__(self, string)
(Greater-than operator)

source code 

x>y

Overrides: str.__gt__
(inherited documentation)

__le__(self, string)
(Less-than-or-equals operator)

source code 

x<=y

Overrides: str.__le__
(inherited documentation)

__lt__(self, string)
(Less-than operator)

source code 

x<y

Overrides: str.__lt__
(inherited documentation)