Rudi,
This is a complicated situation. Probably it is explained in bits and pieces other places, but for this problem I think it is easiest if I write a new explanation here.
First, the short answer is that I'm not sure what all is going wrong. I would need to work with the original source images to be sure.
However, from the stacked results that I see here, I can make some educated guesses.
The first big issue is that Zerene Stacker and Photoshop use very different methods for aligning images.
Zerene Stacker uses only shift/rotate/scale, and its measure of good fit is to minimize the average difference in pixel values across the entire image. This essentially assumes that the images were shot from the same viewpoint and with almost the same framing, with the images varying only in focus.
Photoshop also uses shift/rotate/scale, and in addition it will introduce various non-linear transformations like barrel/pincushion distortion and perspective keystoning. Photoshop's measure of good fit is to minimize the average distance between "features" that are selected for having distinctive patterns of pixel values in a small neighborhood. This method has its roots in side-by-side panorama stitching. It essentially assumes that the images were shot from the same viewpoint but possibly have very different framing and only a little overlap.
When the assumptions are violated, both methods will fail, but they fail in different ways. In the current stack, it looks to me like either the camera or the leaves were moving around so that the leaves and the fly and the background line up differently in different frames. Photoshop appears to have latched onto the fly that you care about, while "letting go" of the leaves, especially the leaves in the upper left corner. Zerene Stacker, in contrast, has essentially chosen to keep the leaves lined up as well as it could, while letting go of the fly. (This is because the fly occupies relatively few pixel positions compared to the leaves.)
The second big issue is that Zerene Stacker and Photoshop have very different methods for combining images.
In Zerene Stacker, the PMax method is relentless about preserving the sharpest available detail at each pixel position. This works very well in a well formed focus stack where everything is properly aligned, but when there are misalignments, it results in severe "ghosting" as each sharp detail may be seen simultaneously in all the positions it occupies across all frames.
Photoshop, on the other hand, is committed to the concept of carving out fairly large regions of pixels, each from a single source image, and then pasting those regions together with a bit of blending. When there are misalignments, this method avoids ghosting, but instead introduces "steps" on the region boundaries.
These differences are evident in the two results that you sent to me. The ZS result shows ghosting; the CS6 result shows steps.
(As further information, Zerene Stacker has a second method, DMap, that is more like what Photoshop does. The major difference between DMap and Photoshop is that DMap assumes that the images are shot in depth order, either back to front or front to back. It uses this assumed ordering to figure out what to do in areas where there is no focused detail.)
The third big issue is that Zerene Stacker and Photoshop have very different methods for determining the final framing.
Zerene Stacker determines final framing by selecting a single source image, either the first or the last one in the input sequence, then using that one image's framing for the final result. The default method is to choose whichever end of the stack has the narrowest field of view, which tends to avoid an artifact known as "edge streaks".
Photoshop determines final framing by registering all the images against each other, then creating a large frame that encompasses all of the registered source images. Essentially this results in producing the widest possible field of view, potentially including lots of edge areas that are covered by less than all the source images.
These differences in framing are easily seen in the two results that you sent. It appears that there was substantial misalignment between the various frames, so that Photoshop produced a "wide angle" output covering all frames, while Zerene Stacker produced a "narrow angle" output determined by just one frame.
Finally, in Zerene Stacker there is an image-saving option labeled "Retain full dynamic range" that is often misunderstood. What this option does is to reduce contrast and brighten or darken the image as necessary to avoid clipping pixel values that have been internally computed to be either "brighter-than-white" or "darker-than-black". Your Zerene Stacker output has a generally low contrast appearance that is typical of this option having been selected when the output image was saved.
With luck, this discussion will explain why the two output images have the appearances that they do.
But I suspect that you're really interested in knowing how you can get better results in the future.
The answer to that question is to avoid movement. Movement of either the subject or the camera will make it impossible for any stacking software to produce a perfect result.
If movement cannot be avoided, then try to select your viewpoint so that everything visible in the frame moves together. (In the current case, I suspect that the leaves at upper left and lower right are moving differently.)
If there is significant residual movement, then in Zerene Stacker it's usually better to use DMap than PMax. That will avoid the ghosting problem. Even so, you may end up getting a better result from Photoshop in cases like your current stack, where Photoshop happens to latch onto the subject you care about and keeps it aligned at the expense of other image elements.
Note that "avoid movement" may mean avoiding some situations altogether. There are not many successful focus stacks of flies on leaves in complex environment. The reason for this is that almost always something moves between frames. This not only messes up the automatically stacked result, but usually makes it difficult to fix with manual retouching as well. In the end (in these cases) the result is not worth the cost. Photographers who do successfully stack flies on leaves in complex environments generally do that by getting up early and shooting while both the air and the bugs are quiet. There's a definite reason why so many stacked flies are covered by dewdrops or frost!
Getting back to your original question:
> What am I doing wrong with the stack in Zerene, only stacked in Pmax.
I think the problem is not that you're doing something wrong in processing the stack, but rather that it's an impossible stack in the first place because things are moving around too much from one picture to another.
Again, I would have to see the original frames to be sure, but this is my best guess given what I see in your outputs.
Did this discussion answer your questions? Please let me know. Thanks!
Best regards,
Rik Littlefield
Zerene Systems