I wish the term "crop factor" had never been coined.
If we instead look at any size of sensor or film, just as the size it is, it is a bit easier to understand how it relates to depth of field. The simplest way to look at it would be if you think of yourself as also being smaller or larger than average, so that the world around you is either a lilliputworld or a giant world. For a small person with a small camera, everything is far away, and far away means the DOF is large. For a giant person with a giant camera, everything is close, so DOF is shallow as anything is taken very close-up.
And then we should not look at focal length in the same way if we make a comparison. Maybe the simplest case is comparing a full frame camera with a µ4/3 camera. The full frame camera uses a 50 mm lens and the µ4/3 camera uses a 25 mm lens. They take the same subject from the same distance, and the 50 mm lens at f/2 has an entrance pupil of 25 mm while the 25 mm lens of the µ4/3 camera has an entrance pupil of 12.5 mm at the same f-stop. The image they take has the same angle of view, but seen from the position of the subject, the spatial angle to the µ4/3 lens is smaller, which implies larger depth of field. So the camera with the smaller sensor at f/2 with this normal lens has the same DOF as the larger one at f/4.