Sample complexity for learning Boltzmann Distribution parameters

2018-10-10 13:17:55

I am trying to think through the number of samples that I would need to estimate the parameters of a Boltzmann partition function to a desirable precision.

Suppose that there are N possible states of the world, with the probability of state i being observed equal to

$$Pr(i | \theta) = \frac{e^{-\theta_i}}{ \sum_{i=1}^N e^{-\theta_i}}.$$

I don't know what the values of $\{\theta_i\}_{i=1}^N$ are, but I can sample independent observations from the set {1,...,N} of possible states of the world, distributed according to the distribution $Pr(i)$ given above.

Let's say I draw $m$ observations $X_1,...,X_m \in \{1,...,N\}$, and define the maximum likelihood estimator

$$\widehat{\theta} = \arg \max_{\theta'} \sum_{j=1}^m Pr(X_j | \theta)$$

When $m \to \infty$, then $\widehat{\theta} = \theta$. However, I'm not sure how many samples $m$ I would need to ensure that the estimate $\widehat{\theta}$ is probably approximately correct. That is, how large does $m$ need to b