Many recent works attempt to index multimedia data based on characteristics such as speaker identify and emo- tional content. In this work, speaker segmentation is performed on movies to extract the shots in which the target actor is speaking. A case of speaker identification on conversational speech under noisy conditions - this work is organized into two phases; an audio classification phase, for the removal of non-speech content, followed by a speaker recognition phase. Along with the speaker models, Gaussian Mixture Models are constructed for sound effects like fight sequences and drum beats to refine the removal of non-speech sounds. Results prove the effectiveness of this deviation from the conventional methods. © 2007 IEEE.