Blog

ProjectBlog: Bayes Text Classification in Kotlin for Android without TensorFlow

Blog: Bayes Text Classification in Kotlin for Android without TensorFlow


Photo by Luca Bravo on Unsplash

Text Classification has been an important task in Natural Language Processing because of its capabilities and a wide range of uses. We will learn about using this technique in a non-deep learning way, without using TensorFlow and Neural Networks. This classifier will work in an Android application so it’s needed to write in Kotlin or Java.

But why Kotlin, why not our TensorFlow or Python?

We aren’t using TensorFlow? Because it’s written in C++, models are constructed in Python and we need to run it in Kotlin!

TensorFlow and TensorFlow Lite can work efficiently ( or sometimes in-mind blowing ways ) on Android. A similar algorithm could be created in any programming language like C, C++ or even Swift ( native to iOS ), if it can be created in Kotlin ( native to Android ).

Sometimes, the classifier which is coded natively in a platform can perform far better than TensorFlow or its APIs. Also, we can have more control-flow over its working and inferencing.

Which Machine Learning algorithm are we going to use? What are we exactly creating?

We will be using a Naive Bayes Text Classifier for classifying text in Kotlin which will ultimately run on an Android device.

Math is coming! Be ready!

Talking about Naive Bayes Text Classification,

Naive Bayes Text Classification uses the power of Bayes Theorem to classify a document ( text ) into a certain class.

Bayes Theorem

If we cast the equation according to our needs for text classification, then it would become like,

Eq.1

Where we represent our document as tokens x₁, x₂ … xₙ and C is the class for which we will calculate the probability. The denominator is omitted and see here for its explanation ( since in case of both the classes ( C₁ and C₂ ), P( x₁ , x₂ … xₙ ) will remain constant and will act as an normalizing constant )

We will calculate the probabilities for 2 classes namely SPAM ( C₁ ) and HAM ( C₂ ). The one which has higher probability will be our output.

For each class, we have a vocabulary or set of words which occur in spam or ham words which will represent our class corpus.

Let’s Start With Kotlin.

If you loved Python earlier!

First, we will define our corpus positiveBagOfWords and negativeBagOfWords which contain spam and ham words respectively.

Now, we create a new class named Classifier for handling the classification task. We need to define two constants and a method which extracts tokens from a given piece of text ( by removing unnecessary words, punctuation etc.).

getTokens( document ) = tokens. Hence we can transform a document D to a set of tokens like x₁ ,x₂ … xₙ.

Finding the probabilities

First, we need to find P( C ) or the class probability. This is nothing but the probability of how many words from both the corpora belong to class C.

Class probabilities. Eq.2

Calculates Eq.1

Next, we need to find P( X | C ) which is the probability of X given that it belongs to a class C.

Before this, we will need a method to find P( xᵢ | C ) given xᵢ which is a token in the given document. We can use this method.

Calculates P( xᵢ | c )

Where class_vocab is one of the corpora. It represents the C in P( xᵢ | C ). Wondering from where the 1 came in? That’s Laplace Smoothing. If P( xᵢ | C ) is 0 when xᵢ does not exist in our corpus, then all our P( X | C ) could become 0 . Adding 1 can solve this problem.

Now, we need to multiply all the P( xᵢ | C ) together and finally multiply it with P( C ) which is our class probability in the method below.

Calculates Eq.1

That’s All. Now we need to check which class has a higher likelihood.

You can see the full code at one glance in this gist.

That’s long and a bit Math-y. It’s the end!

Hope you liked the idea of Naive Bayes in Kotlin. Feel free to share your feedback in the comments section below.

It’s my first Math-Heavy article, so apologize me for incorrections in notation on precision. :-)

Happy Machine Learning.

Source: Artificial Intelligence on Medium

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top
a

Display your work in a bold & confident manner. Sometimes it’s easy for your creativity to stand out from the crowd.

Social