Clarifications: You will likely need to experiment with the number of hidden units to find how many are needed to produce good performance without overfitting.
Note that you may need to perform learning more than once from different sets of initial weights to achieve good performance (find a more-or-less global error minimum). To track learning algorithm status during training, you might want to accumulate total squared error during each epoch, then take the mean (and perhaps then the square root) at the end of each epoch to produce a plot of mean squared error (\textit{MSE}) or root mean squared error (\textit{RMS error}) versus training epoch. You might also consider training with some noisy patterns in the training set to see if that helps generalization.