Protein-ligand binding affinity prediction is critical for drug development, where rapid and accurate assessment of drug-target binding remains a key challenge. Deep learning-based modeling provides a more efficient and scalable solution compared to traditional experimental approaches. This paper proposes a multi-scale interaction feature fusion model for protein–ligand binding affinity prediction, named the multi-scale binding affinity predictor (MSBind). The MSBind model employed multiple attention mechanisms to extract both global and local interaction features between the protein-ligand complex. By jointly modeling this multi-scale information, it ultimately enhanced the model’s predictive performance. Experimental results showed that on the PDBbind v2016 core dataset, the root mean square error of MSBind was only 1.23, which was significantly lower than the baselines, demonstrating superior predictive accuracy. Furthermore, dimensionality reduction visualization of the extracted interaction features provided additional validation of the interpretability and effectiveness of MSBind. In summary, by integrating multi-scale interaction features, MSBind provides a high-performance solution for protein-ligand binding affinity prediction.