Constructing transition probability matrix

William 08/07/2018. 3 answers, 179 views

I have the following list:

x={"A", "A", "A", "E", "D", "D", "D", "C", "B", "E", "E", "E", "D", \
"B", "A", "D", "B", "E", "C", "A", "D", "A", "A", "A", "A", "C", "C", \
"C", "D", "D", "E"}

I want to make the Markov transition probability matrix of first order. To do so I have started by writing:

Partition[x, 2, 1] // Sort // Counts

This will give:

<|{"A", "A"} -> 5, {"A", "C"} -> 1, {"A", "D"} -> 2, {"A", "E"} ->
1, {"B", "A"} -> 1, {"B", "E"} -> 2, {"C", "A"} -> 1, {"C", "B"} ->
1, {"C", "C"} -> 2, {"C", "D"} -> 1, {"D", "A"} -> 1, {"D", "B"} ->
2, {"D", "C"} -> 1, {"D", "D"} -> 3, {"D", "E"} -> 1, {"E", "C"} ->
1, {"E", "D"} -> 2, {"E", "E"} -> 2|>

above shows the frequencies of state transition A to A, A to B, A to C, A to D and A to E and so on for other letters, I wonder how can I show this result as a matrix?

Henrik Schumacher 08/07/2018.

You can use SparseArray with additive assembly as follows:

x = RandomChoice[Alphabet["English", "IndexCharacters"], 1000000];
data = Flatten[ToCharacterCode[x]] - (ToCharacterCode["A"][[1]] - 1); // AbsoluteTiming // First
A = With[{
spopt = SystemOptions["SparseArrayOptions"]},
InternalWithLocalSettings[
(*switch to additive assembly*)
SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> Total}],

(*assemble matrix*)
SparseArray[
Partition[data, 2, 1] -> 1,
Max[data] {1, 1}
]

,
(*reset "SparseArrayOptions" to previous value*)
SetSystemOptions[spopt]]
]; // AbsoluteTiming // First

0.739454

0.114682

Remarks

• As the timings suggest, it is worthwhile to avoid strings in the first place.

• Formerly, I used LetterNumber, but ToCharacterCode is much, much faster.

• It is "TreatRepeatedEntries" -> Total which enables summing of entries. Count is not needed anymore. DeveloperToPackedArray might speed up things a bit if x is very long. The other hokus-pokus is for making things bulletproof against aborts (i.e., options are reset even if computations are interrupted). See also (37566) and (136017).

Anton Antonov 08/08/2018.

You can use the package CrossTabulate.m. (More detailed references are given in this MSE answer.)

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/CrossTabulate.m"]

cmat = CrossTabulate[Partition[x, 2, 1]];

MatrixForm[cmat]

cmat["SparseMatrix"] = cmat["SparseMatrix"]/Total[cmat["SparseMatrix"], {2}];
MatrixForm[cmat]

ArrayRules[cmat["SparseMatrix"]]

(* {{1, 1} -> 5/9, {1, 5} -> 1/9, {1, 4} -> 2/9, {1, 3} -> 1/
9, {2, 5} -> 2/3, {2, 1} -> 1/3, {3, 2} -> 1/5, {3, 1} -> 1/
5, {3, 3} -> 2/5, {3, 4} -> 1/5, {4, 4} -> 3/8, {4, 3} -> 1/
8, {4, 2} -> 1/4, {4, 1} -> 1/8, {4, 5} -> 1/8, {5, 4} -> 2/
5, {5, 5} -> 2/5, {5, 3} -> 1/5, {_, _} -> 0} *)

kglr 08/08/2018.

You can also use EstimatedProcess and MarkovProcessProperties as follows:

states = DeleteDuplicates[x];
ordering = Ordering[states];
data = ArrayComponents @ x ;
estproc = EstimatedProcess[data, DiscreteMarkovProcess[Length@states]];
tm = MarkovProcessProperties[estproc, "TransitionMatrix"][[ordering, ordering]]

TeXForm[TableForm[tm, TableHeadings -> {states[[ordering]], states[[ordering]]}]]

$\begin{array}{cccccc} & \text{A} & \text{B} & \text{C} & \text{D} & \text{E} \\ \text{A} & \frac{5}{9} & 0 & \frac{1}{9} & \frac{2}{9} & \frac{1}{9} \\ \text{B} & \frac{1}{3} & 0 & 0 & 0 & \frac{2}{3} \\ \text{C} & \frac{1}{5} & \frac{1}{5} & \frac{2}{5} & \frac{1}{5} & 0 \\ \text{D} & \frac{1}{8} & \frac{1}{4} & \frac{1}{8} & \frac{3}{8} & \frac{1}{8} \\ \text{E} & 0 & 0 & \frac{1}{5} & \frac{2}{5} & \frac{2}{5} \\ \end{array}$