entropy.lziv_complexity

entropy.lziv_complexity(sequence, normalize=False)[source]

Lempel-Ziv (LZ) complexity of (binary) sequence.

New in version 0.1.1.

Parameters

sequencestr or array: A sequence of character, e.g. '1001111011000010', [0, 1, 0, 1, 1], or 'Hello World!'.
normalizebool: If True, returns the normalized LZ (see Notes).

Returns

lzint or float: LZ complexity, which corresponds to the number of different substrings encountered as the stream is viewed from the beginning to the end. If normalize=False, the output is an integer (counts), otherwise the output is a float.

Notes

LZ complexity is defined as the number of different substrings encountered as the sequence is viewed from begining to the end.

Although the raw LZ is an important complexity indicator, it is heavily influenced by sequence length (longer sequence will result in higher LZ). Zhang and colleagues (2009) have therefore proposed the normalized LZ, which is defined by

\[\text{LZn} = \frac{\text{LZ}}{(n / \log_b{n})}\]

where \(n\) is the length of the sequence and \(b\) the number of unique characters in the sequence.

References

Lempel, A., & Ziv, J. (1976). On the Complexity of Finite Sequences. IEEE Transactions on Information Theory / Professional Technical Group on Information Theory, 22(1), 75–81. https://doi.org/10.1109/TIT.1976.1055501
Zhang, Y., Hao, J., Zhou, C., & Chang, K. (2009). Normalized Lempel-Ziv complexity and its application in bio-sequence analysis. Journal of Mathematical Chemistry, 46(4), 1203–1212. https://doi.org/10.1007/s10910-008-9512-2
https://en.wikipedia.org/wiki/Lempel-Ziv_complexity
https://github.com/Naereen/Lempel-Ziv_Complexity

Examples

>>> from entropy import lziv_complexity
>>> # Substrings = 1 / 0 / 01 / 1110 / 1100 / 0010
>>> s = '1001111011000010'
>>> lziv_complexity(s)
6

Using a list of integer / boolean instead of a string

>>> # 1 / 0 / 10
>>> lziv_complexity([1, 0, 1, 0, 1, 0, 1, 0, 1, 0])
3

With normalization

>>> lziv_complexity(s, normalize=True)
1.5

This function also works with characters and words

>>> s = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> lziv_complexity(s), lziv_complexity(s, normalize=True)
(26, 1.0)

>>> s = 'HELLO WORLD! HELLO WORLD! HELLO WORLD! HELLO WORLD!'
>>> lziv_complexity(s), lziv_complexity(s, normalize=True)
(11, 0.38596001132145313)