The author's idea is to use the pre-trained modulator mt encoded in Tm to communicate through m, while drawing on the cross-attention operator used in M and m models to fuse information of different modes. Accept dual input including algorithmic problem specifications in text form T t and its corresponding graph representation nodes and output the text answer to the question. The input graph representation follows the format of the algorithm reasoning benchmark -. We can assume that after encoding, the text input is stored in T ∈ ^T× and the graph input is stored in ∈ ^×.
The forward propagation of is as follows t+, the iran phone number example text representation t is fed into the current layer of Tm where Qt,t ∈ ×_t ∈ × are the transformations of the key query and value matrices respectively, which is a feed-forward neural network. In a similar way, the graph representation is fed into the layer such that a standard mx-M is implemented where ψ,ϕ × → are the learnable message functions respectively and the update function mx is the element-wise maximum aggregation.
Note that the equation only briefly provides pairwise interactions between nodes - in reality here is a Tt-M which also contains triple interactions and a gating mechanism. Also note that the learnable part of has no time step index - the same shared function is applied at each step. This fits nicely with the iterative and repetitive nature of graph algorithm computation. Once both streams are prepared with their representations Θt+ and the node embeddings in the graph t+ are conditioned on the embeddings of t in Tm to produce the final result of the block T in the Tm stream where Qt×,t× ∈ ×_, tx ∈ × are the key query and value transformations of the cross-attention respectively.
First we initialize the
-
- Posts: 30
- Joined: Mon Dec 23, 2024 6:11 am