## We assume that usages of the token occur 'once in a while' as the ## student generates words. A statistical process that describes such ## a situation is a Poison process. It can be expressed as ## P(Y=y)~((e^-lambda)*(lambda^f)))/y!, Where lambda is the expected ## rate (eg, occurrences per hundred words) and y is the number ## actually observed. Your variables determine the lambda: ## log(lambda)=x*beta where beta is your list of coefficients and x is ## the matrix of values your variables take on. This is a little ## harder function to fit than a straight line, but glm with a Poison ## family specified does this. The coefficients thus are interpreted as ## contributing to the expected rate that most closely fits the counts ## we that were present in our y (the vector of responses we matched ## to the matrix of xes).