A Simple Browser based Alpha-Numeric Prediction and Text-to-Speech using TensorFlow

Archan Ghosh
3 min readMay 9, 2020

The following project that we did, was meant to solve the very menial problem of Character identification. But it was made from the ground up for both beginners to understand and get a very good idea of what really goes on inside fully deployed Deep Learning Project.

What Our Model does!

Our model is a real-time alphanumeric classifier based on the browser. We have implemented a very simple CNN network that is trained on two Datasets and then deployed using Flask.

The Elements Implemented

  • Alphabet Dataset From Kaggle(linked below)
  • MNIST Dataset
  • Custom CNN Architecture
  • gTTs(To provide real-time Text-to-speech in the browser)

DATASETs

We used two Datasets as mentioned above:

  • MNIST- 28x28 Grayscale Images

Preparing The Dataset

Next, We use TESTTRAINSPLIT and then Combine Both the MNIST and Alphabet dataset into one.

X_train = np.vstack([X_char_train,X_num_train])
X_test = np.vstack([X_char_test,X_num_train])
y_train = pd.concat([y_char_train,y_num_train], axis=0, ignore_index=True)
y_test = pd.concat([y_char_test,y_num_test], axis=0, ignore_index=True)
y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)

Classifier Model Creation and Fitting

The CNN, that was used is very simple, and we did this for better understanding the model and for easier error corrections.

model=keras.models.Sequential([keras.layers.Conv2D(32,3,activation=’ relu’, input_shape=[28,28,1]),
keras.layers.Conv2D(64, (3, 3), activation=’relu’),
keras.layers.MaxPooling2D(pool_size=2),
keras.layers.Dropout(.4),
keras.layers.Flatten(),
keras.layers.Dense(128, activation=’relu’),
keras.layers.Dense(36, activation=’softmax’),
])

Fitting

We ran a total of 18 using ADAM optimizer, Epochs which more or less gave us a metric to work with.

model.compile(optimizer='adam', loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])
history = model.fit(X_train, y_train, validation_data=[X_test,y_test], epochs=18)

The following loss was acquired

Train loss vs Test loss

Here we use test data as validation so test loss is validation loss.

FINAL DEPLOYMENT & PREDICTION ON BROWSER

The model was saved as and .h5 file which was then imported into a flask application. The flask application contained a basic template design which had the following elements:

  • [1] A drawable area for alphanumeric characters
  • [2] A Visual Score of Predicted Output
  • [3] A gTTs prompt which converted the predicted Character to speech.

[1] The Drawable Area: Whatever was drawn in the square area was resized into a 28x28 image of batch 1 and grayscale that was required for prediction by our model, using openCV.

[2]The Visual Prediction: We provided the class which was predicted by the model.

[3] The gTTs output: The final Part was a gTTs prompt which was played immediately after prediction.

Prediction Demo

Conclusion

We hope this gives a clear idea of how simple it is to design and deploy a Deep learning model. Furthermore, this particular model can be improved by using better design and can be extended to bigger problems such as in-line text identification using the IAM Dataset and in-line translation.
This project was an effort from students to students who might think that deep learning is tough and complex and in all reality is easy and very straight forward.

Here is the GitHub link for the project which can be used:
https://github.com/ArchanGhosh/Handwriting-Detection-with-TTS

Co-Authors

: Junior in B.tech Computer Science

: Sophomore in B.tech Computer Science

--

--

Archan Ghosh

Machine Learning & Data Science Enthusiast | Learner by day, gamer by night and streamer by passion |