The secret is built into the soon to be released iTunes 8.1. iTunes will put some extra voice data in your music files to include the name of the band and title of the song. The rendering will be done on the computer and will use the audio voices you already have with your OS. This is why the PC users will hear a woman and Mac users will hear a male (Alex).
Though the additional audio data will be small it will still increase the size of your iTunes Library. Eventually, we may be able to download tracks from iTunes with feature voices already included. ie. Artists could announce their own tracks.
9to5Mac via Hrmph has posted some information from the original patent application for this technology. You can find some of it below.
Patent Application Details
Audio user interface for computing devices
In order to achieve portability, many hand-held devices use user interfaces that present various display screens to the user for interaction that is predominantly visual. Users can interact with the user interfaces to manipulate a scroll wheel and/or a set of buttons to navigate display screens to thereby access functions of the hand-held devices. However, these user interfaces can be difficult to use at times for various reasons. One reason is that the display screens tend to be small in size and form factor and therefore difficult to see. Another reason is that a user may have poor reading vision or otherwise be visually impaired. Even if the display screens can be perceived, a user will have difficulty navigating the user interface in eyes-busy situations when a user cannot shift visual focus away from an important activity and towards the user interface. Such activities include, for example, driving an automobile, exercising, and crossing a street.
It is noted that text strings that correspond to standard text strings can have pre-recorded audio files. Such text strings may correspond to common user interface controls, such as play, stop, previous, etc., and to common menu items such as Music, Extras, Backlight. These audio files can be created using a voice talent or speech synthesized from the voice talents recordings. The other text displayed as part of the media player user interface that is usually user specific, such as contacts and customized playlist names can all be synthesized by building a voice from the voice talent recordings. This provides consistency by having the same voice for all textual data to be presented to the user.