Whether you’re a programmer with little to no knowledge of Python, or an experienced data scientist or engineer, this Learning Path will walk you through natural language processing, using both Python and Scala, and show you how to implement a range of popular tools including Spark, scikit-learn, SpaCy, NLTK, and gensim for text mining.
SpaCy, a fast, user-friendly library for teaching computers to understand text, simplifies NLP techniques, such as speech tagging and syntactic dependencies, so you can easily extract information, attributes, and objects from massive amounts of text to then document, measure, and analyze. This Learning Path is a hands-on introduction to using SpaCy to discover insights through natural language processing. While end-to-end natural language processing solutions can be complex, you’ll learn the linguistics, algorithms, and machine learning skills to get the job done.
Even though computers can't read, they're very effective at extracting information from natural language text. They can determine the main themes in the text, figure out if the writers of the text have positive or negative feelings about what they've written, decide if two documents are similar, add labels to documents, and more.
The course is designed for engineers and data scientists who have some familiarity with Scala, Apache Spark, and machine learning who need to process large natural language text in a distributed fashion.We will use sample of posts from the subreddit /r/WritingPrompts, which contains short stories and comments about the short stories.The course has four parts1. Building a natural language processing and entity extraction pipeline on Scala & Spark2.