Implement other sampling methods

Currently only the "greedy" sampling is implemented (the token with the highest probability is selected).

Implement other sampling methods, some options are:

* top-p
* top-k
* temperature (here is an example how it could be done: https://github.com/jaymody/picoGPT/pull/19)
* categorical sampling