A machine learning model for identifying 20 diverse languages as well as reporting ‘Other’ for languages on which model was not trained. The approach uses character (within word boundaries) level TF-IDF featurizer followed by a Multinomial Naive Bayes classifier.