There are all kinds of ways they could get a bad result. One way to check it is to run the program many times. If you run it 1000 times you should get real close to the expected return. If you don't, then there is a bug. That's a simple fact. The result claimed is outside of 2 SD and is therefore unusable for any real purpose.
I have written several simulation programs and almost all of them had some bugs that needed to be fixed. I knew there were bugs because I knew what a reasonable output would look like. Someone without this type of background is very likely to miss things.