By Kuang Hao, Research Computing, NUS Information Technology, on 24 September 2020
NLP (Natural Language Processing) is not easy. When you are getting interested in it, immediately you realize that raw texts cannot be fed to our machine who only understands 0s and 1s. To make things harder, meaning of words can be ambiguous; sentences’ interpretation varies depending on the context; and there are hundreds of languages with different syntax & grammar rules. Fortunately, thanks to recent pioneers in NLP, there are standardized pipelines to follow and easy-to-use tools at hand.
This article introduces the most popular tools in NLTK, how to set up and use them, as well as its availability on our HPC clusters.