antropy.lziv_complexity

antropy.lziv_complexity(sequence, normalize=False)[source]

Lempel-Ziv (LZ) complexity of (binary) sequence.

New in version 0.1.1.

Parameters
sequencestr or array

A sequence of character, e.g. '1001111011000010', [0, 1, 0, 1, 1], or 'Hello World!'.

normalizebool

If True, returns the normalized LZ (see Notes).

Returns
lzint or float

LZ complexity, which corresponds to the number of different substrings encountered as the stream is viewed from the beginning to the end. If normalize=False, the output is an integer (counts), otherwise the output is a float.

Notes

LZ complexity is defined as the number of different substrings encountered as the sequence is viewed from begining to the end.

Although the raw LZ is an important complexity indicator, it is heavily influenced by sequence length (longer sequence will result in higher LZ). Zhang and colleagues (2009) have therefore proposed the normalized LZ, which is defined by

\[\text{LZn} = \frac{\text{LZ}}{(n / \log_b{n})}\]

where \(n\) is the length of the sequence and \(b\) the number of unique characters in the sequence.

References

Examples

>>> from antropy import lziv_complexity
>>> # Substrings = 1 / 0 / 01 / 1110 / 1100 / 0010
>>> s = '1001111011000010'
>>> lziv_complexity(s)
6

Using a list of integer / boolean instead of a string

>>> # 1 / 0 / 10
>>> lziv_complexity([1, 0, 1, 0, 1, 0, 1, 0, 1, 0])
3

With normalization

>>> lziv_complexity(s, normalize=True)
1.5

This function also works with characters and words

>>> s = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> lziv_complexity(s), lziv_complexity(s, normalize=True)
(26, 1.0)
>>> s = 'HELLO WORLD! HELLO WORLD! HELLO WORLD! HELLO WORLD!'
>>> lziv_complexity(s), lziv_complexity(s, normalize=True)
(11, 0.38596001132145313)