A genetic example of Latent Dirichlet Allocation
There are lots of blogs or examples online to explain how a document is generated by Latent Dirichelt Allocation. But during a discussion last week, Prof. Banks gave me a very informative example of Latent Dirichelet Allocation, and it illustrates the way we inherit the genes from our ancestors.
Suppose Mark is a young data scientist living in the San Francisco Bay Area, his maternal family is from Japan, and his paternal family is from Europe. His favorite food is volcano rolls and Cobb Salad. What kind of genes might Mark have? Let’s try to analyze it in a LDA way:
- Mark (Document): 50% genes are from his mother, and 50% genes are from his father.
- Mark’s mother (Topic): 10% Chinese genes(word), 10% Korean genes(word), 80% Japanese genes(word).
- Mark’s father (Topic): 75% British genes(word), 25% German genes(word).
How every part of genes is chosen(generated) before he was born?
- First, choosing if he wants to inherit the gene from his mother or his father
- Then picking the gene based on the probability of the chosen parent
This example is impressive and I personally think it’s more reasonable than many topic modeling examples, since usually the way people use words and make sentences is more dynamic and unpredictable.