entropy.lziv_complexity
-
entropy.
lziv_complexity
(sequence, normalize=False)[source] Lempel-Ziv (LZ) complexity of (binary) sequence.
New in version 0.1.1.
- Parameters
- sequencestr or array
A sequence of character, e.g.
'1001111011000010'
,[0, 1, 0, 1, 1]
, or'Hello World!'
.- normalizebool
If
True
, returns the normalized LZ (see Notes).
- Returns
- lzint or float
LZ complexity, which corresponds to the number of different substrings encountered as the stream is viewed from the beginning to the end. If
normalize=False
, the output is an integer (counts), otherwise the output is a float.
Notes
LZ complexity is defined as the number of different substrings encountered as the sequence is viewed from begining to the end.
Although the raw LZ is an important complexity indicator, it is heavily influenced by sequence length (longer sequence will result in higher LZ). Zhang and colleagues (2009) have therefore proposed the normalized LZ, which is defined by
\[\text{LZn} = \frac{\text{LZ}}{(n / \log_b{n})}\]where \(n\) is the length of the sequence and \(b\) the number of unique characters in the sequence.
References
Lempel, A., & Ziv, J. (1976). On the Complexity of Finite Sequences. IEEE Transactions on Information Theory / Professional Technical Group on Information Theory, 22(1), 75–81. https://doi.org/10.1109/TIT.1976.1055501
Zhang, Y., Hao, J., Zhou, C., & Chang, K. (2009). Normalized Lempel-Ziv complexity and its application in bio-sequence analysis. Journal of Mathematical Chemistry, 46(4), 1203–1212. https://doi.org/10.1007/s10910-008-9512-2
Examples
>>> from entropy import lziv_complexity >>> # Substrings = 1 / 0 / 01 / 1110 / 1100 / 0010 >>> s = '1001111011000010' >>> lziv_complexity(s) 6
Using a list of integer / boolean instead of a string
>>> # 1 / 0 / 10 >>> lziv_complexity([1, 0, 1, 0, 1, 0, 1, 0, 1, 0]) 3
With normalization
>>> lziv_complexity(s, normalize=True) 1.5
This function also works with characters and words
>>> s = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' >>> lziv_complexity(s), lziv_complexity(s, normalize=True) (26, 1.0)
>>> s = 'HELLO WORLD! HELLO WORLD! HELLO WORLD! HELLO WORLD!' >>> lziv_complexity(s), lziv_complexity(s, normalize=True) (11, 0.38596001132145313)