Recent advances in deep Convolutional Neural Networks (CNNs) have established them as a premier technique for a wide range of classification tasks, including object recognition, object detection, image segmentation, face recognition and medical image analysis. However, a significant drawback of utilizing CNNs is the requirement for a large amount of annotated data, which may not be feasible in the context of historical document analysis. In light of this, we present a novel CNN-based architecture ResPho(SC)Net, to recognize handwritten word images in a zero-shot learning framework. Our method proposes a modified version of the Phosc(Net) architecture with a much lesser number of trainable parameters. Experiments were conducted on word images from two languages (Norwegian and English) and encouraging results were obtained. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.