## We assume that usages of the token occur 'once in a while' as the
## student generates words. A statistical process that describes such
## a situation is a Poison process. It can be expressed as
## P(Y=y)~((e^-lambda)*(lambda^f)))/y!, Where lambda is the expected
## rate (eg, occurrences per hundred words) and y is the number
## actually observed. Your variables determine the lambda:
## log(lambda)=x*beta where beta is your list of coefficients and x is
## the matrix of values your variables take on. This is a little
## harder function to fit than a straight line, but glm with a Poison
## family specified does this. The coefficients thus are interpreted as
## contributing to the expected rate that most closely fits the counts
## we that were present in our y (the vector of responses we matched
## to the matrix of xes).