INTRODUCTION TO THE TASK!
This Capstone Project is about a basic form of Natural Language Processing (NLP) called Sentiment Analysis. For this task, you are required to use two different neural networks in Keras to try and classify a book review as either positive or negative, and report on which network type worked better. For example, consider the review, “This book is not very good.” This text ends with the words “very good” which indicates a very positive sentiment, but it is negated because it is preceded by the word “not”, so the text should be classified as having a negative sentiment. We need to teach our neural network to recognise this distinction and be able to classify the review correctly. This problem can be broken down into the following steps:
Get the dataset 2. Preprocessing the Data 3. Build the Model 4. Train the model 5. Test the Model 6. Predict Something
We will be working with real-world data in this task.
GET THE DATASET
For this task, we will be using a small portion of the Multi-Domain Sentiment Dataset, which contains product reviews from Amazon. The full dataset contains reviews for products under the categories: kitchen, books, DVDs, and electronics, but we will only be looking at reviews for the book category.
We have two files, positive.txt and negative.txt, containing the reviews. Each review is associated with a number of fields. The only field we are interested in is the “title” field, which contains the title of the review. We are going to use this title to predict the sentiment of the review. We could have used the review text itself, however, as this is a lot longer than the title, this is a much harder task.
For example, a review title might be “Horrible book”, whilst the review text might be “This book was horrible. If it was possible to rate it lower than one star I would have.”
Both the review title and the review text have the same sentiment — the title is just much more concise, which makes this task easier.
While some sentiments are easy to classify, like “don’t buy this horrible book”, others are less straightforward, like “’run don’t walk to buy this book”. The latter is hard to classify because it contains the word “don’t”, which might be seen as an indication that this is a negative review, whereas actually, it is a positive one (the reviewer is suggesting that you should go as fast as you can to get the book). Thus, sentiment analysis is not always straightforward — some samples will be easy to classify, while others will not.
Some code is already included in the notebook associated with this task to start you off, which loads the relevant part of the data (the review heading) and performs some preliminary preprocessing to remove strange characters.
It also is necessary to create a vocabulary (called a text corpus) — words which our neural network will know and to “tokenise” the input. If we have a review, such as “a good book”, it is necessary to turn this into a form that a computer can understand. First, each word in the dataset is mapped to a unique number in the vocabulary. A word tokeniser will then take a sentence like this and convert it to a sequence of numbers, which map to the relevant words in the vocabulary.
Eg: “a good book” becomes an array of numbers: [1, 12, 3]. This mapping means that “a” is the first word in the vocabulary, “good” is the twelfth and “book” is the third. This mapping depends on the dataset supplied to the tokeniser.
The code for tokenisation is already included, but it is important that you understand what it does.
PREPROCESSING THE DATA
n order to feed this data into our network, all input reviews must have the same length. Since the reviews differ heavily in terms of lengths, we either need to trim or pad the reviews so that they are the same length. For this task, we will set the length of reviews to the mean length, which is around 4 words. If reviews are shorter than 4 words we will need to pad them with zeros, if they are longer than 4 words we will trim them to this length by cutting off any words after this. Keras offers a set of preprocessing routines that can do this for us. In order to pad our reviews, we will need to use the pad_sequences function
BUILD THE MODEL
In the task today you will need to build a recurrent neural network to classify sentiment. The network will need to start with a special layer which will assist with text classification through a process called embedding.
Word embedding is a class of approaches for representing words and documents using dense vectors where a vector represents the projection of the word into a continuous vector space (Brownlee, 2017).
The position of a word within the vector space is learned from the text and is based on the words that surround the word when it is used. The position of a word in the learned vector space is referred to as its embedding.
Keras offers an embedding layer, used for neural networks on text data, and requires that the input data be integer encoded, so each word is represented by a unique integer (Brownlee, 2017). We have already achieved this format through tokenisation.
The embedding layer is trained as a part of the neural network and will learn to map words with similar semantic meanings to similar embedding-vectors. It is initialised with random weights and will learn an embedding for all of the words in the training dataset (Brownlee, 2017).
The Embedding layer is defined as the first hidden layer of a network. It must have three arguments (Keras Team, 2020):
● input_dim: This is the size of the vocabulary in the text data. For example, if your data is integer encoded to values between 0-5000, then the size of the vocabulary would be 5001 words.
● output_dim: This is the size of the vector space in which words will be embedded. It defines the size of the output vectors from this layer for each word. For example, it could be 32 or 100 or even larger. This is a hyper-parameter that needs to be tuned — test different values for your problem.
● input_length: This is the length of input sequences, as you would define for any input layer of a Keras model. For example, if all of your input documents are comprised of 4 words, this would be 4. This is the length which we padded/trimmed the inputs to during pre-processing.
Build a neural network, as outlined below.
● A recurrent neural network. This type of network is commonly used in NLP. This network should have the following architecture:
Embedding layer SpatialDropout1D(0.2) BatchNormalization() LSTM(32) Dense(2, activation=’softmax’)
TRAIN AND TUNE THE MODEL
You are now ready to train your model. Remember to compile your model by specifying the loss function and optimizer we want to use while training, as well as any evaluation metrics we’d like to measure — set the optimizer to ‘adam’ and the loss function to ‘binary_crossentropy’.
Once compiled, you can start the training process. Note that there are two important training parameters that we have to specify, namely batch size and the number of training epochs. Together with our model architecture, these parameters determine the total training time.
For the network: ● Set the number of epochs to train for to 5 and batch size to 10. ● Tune the output_dim hyper-parameter of the embedding layer. Try values: 10, 25, 50 and 100. Report on the performance metrics for each value. ● Select the output_dim which gives the best performance on the test set and plot a graph of both the accuracy and loss of the model while training. Use these graphs to determine the point at which the model starts to overfit or if it has not yet converged. Identify a more optimal number of epochs to train for. ● You can also try tuning other metrics — such as batch size — to get the best possible performance. ● Report on the performance metrics of the final model.
Finally, we would like to be able to use our model to predict something. To do this, we need to translate the sentence into the relevant integers and pad as necessary. This will allow us to put it into our model and see whether it predicts if we will like or dislike the book. A small selection of samples has been provided to get you going — you are welcome to add to this.
A notebook is associated with this task, which contains some useful functions/code to get you started.
Follow these steps:
● Use the provided files positive.txt and negative.txt and follow the above steps to train a recurrent neural network. Your goal is to classify the sentiment of book reviews as positive or negative.
● Note: Some blocks of code are labelled do not modify — make sure that the code in these blocks is left alone, or else you will encounter issues. Do read through it and see if you can follow what it does.
● When complete, create basic point form summary of file file in which you describe your project in detail.
Why Work with Us
Top Quality and Well-Researched Papers
We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.
Professional and Experienced Academic Writers
We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.
Free Unlimited Revisions
If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support.
Prompt Delivery and 100% Money-Back-Guarantee
All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.
Original & Confidential
We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services.
24/7 Customer Support
Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.
Try it now!
How it works?
Follow these simple steps to get your paper done
Place your order
Fill in the order form and provide all details of your assignment.
Proceed with the payment
Choose the payment system that suits you most.
Receive the final file
Once your paper is ready, we will email it to you.
No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.
No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.
Admission Essays & Business Writing Help
An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.
Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.
If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.