About

Here is a little background

I am a Quantitative Software Engineer currently working with Goldman Sachs in the Asset Management Division based in Dallas, Texas. I recently completed a Master's in Computer Science from The University of Texas at Dallas, where I specialized in Machine & Deep Learning. During my studies, I worked part-time with ArtSciLab as a Data Analyst and Web Developer, which involved managing web infrastructure and conducting research in the domain of Natural Language Processing.

Prior to my Bachelor's in Computer Science from Indian Institute of Information Technology in 2020, I had worked in the domains of Data Science, Machine Learning, and Full Stack Web Development in multiple companies and on various side-projects & hackathons. I have also been actively researching in the fields of Computer Vision and Natural Language Processing, and have published a few papers in the same. I was also a Google Explore ML Facilitator, training students on campus about Machine Learning.

In my free time, you can find me involved in various side projects, learning new skills, attending hackathons, and reading!

Experience

Quantitative Software Engineer

Goldman Sachs

Dallas, TX, United States
May 2022 - Present

  • Asset & Wealth Management Division

Web Developer

ArtSciLab

Richardson, TX, United States
Jun 2021 - May 2022

  • Managed, redesigned and improved web projects along with servers, database infrastructure and access control of the ArtSciLab.
  • Updated the web documentations and streamlined the update pipeline for better control and efficient updates of the new features.
  • Tech Stack: Backend and CMS - WordPress, Frontend - Next.js (React), Hosting - Digital Ocean, Analytics - Google Analytics, Version Control - GitHub

Data Analyst

ArtSciLab

Richardson, TX, United States
Jun 2021 - May 2022

  • Performed Natural Language Processing and Data Visualization based analysis of virtual Microsoft Teams meetings and conversations to extract hidden insights and sentiments of the participants.
  • Analyzed Reddit data for risk prediction using Natural Language Processing, Intensity Analysis, Topic Modelling and published peer-reviewed White Paper "STONKS - Analyzing Financial Discussions on Reddit".

Grader & Student Assistant

The University of Texas at Dallas

Richardson, TX, United States
Feb 2021 - Aug 2021

  • Debugged, improved and upgraded Java and Spring based "TA Assignment System" to the latest tech-stack for to be used by CS department at The University of Texas at Dallas.
  • Assisted Prof. James Willson with grading CS 3305 (Discrete Mathematics for Computing) class for Spring-2021 semester in Computer Science department at The University of Texas at Dallas

Data Science Intern

Affine

Bengaluru, India
Jan 2020 - Jul 2020

  • Assisted in implementing end-to-end solution for the problem of 'Sales Quota Prediction' for a Fortune-100 client.
  • Worked on EDA techniques, Time Series and Machine Learning models to predict quota for different geographic levels
  • Transformed data cleaning pipeline along with python based report generation to reduce the development time by 50%.

Deep Learning Research Intern

Indian Institute of Technology - BHU

Varanasi, India
May 2019 - Jul 2019

  • Worked on classification, identification, and segmentation problem for breast cancer detection.
  • Used generative adversarial networks(GANs) based techniques to solve the problem of class imbalance.
  • Classified and Segmented images using end-to-end CNN and U-Net based model with 92% accuracy.

Web Developement Intern

The Bomway

Mumbai, India
May 2018 - Jul 2018

  • Designed and deployed a complete admin panel for adding and monitoring content, users, and arts.
  • Developed a shopping cart from scratch using the Flask with the payment, delivery and interface system integration.
  • Integrated adding and monitoring events and blog posts on social medias, increasing user retention by over 60%.

Skills

Machine & Deep Learning

Convolutional & Recurrent Networks

Natural Language Processing

Data Analysis

Data Visualization

Data Pipelines and Engineering

Computer Vision

Full Stack Application Development

Programming Languages

Python, JavaScript, TypeScript, Kotlin, HTML, CSS, C

Web Frameworks

Djnago, Flask, Node, Express, React, Next.js, Spring, WordPress

ML Frameworks

Scikit-learn, TensorFlow, Keras, PyTorch, OpenCV, NLTK, IBM Watson Studio

Data Analysis

Tableau, Excel, Pandas, Matplotlib, IBM Cognos

Databases

SQL, Oracle, MongoDB, Snowflake

Publications

STONKS : Analyzing Financial Discussions on Reddit

ArtSciLab
Apr 2022

Social media has become an important part of digital web life, the effect it has on financial decisions is also increasing. The social media conversations on websites like Twitter, Reddit and Facebook are having an ever-increasing effect on stock prices as well as the way in which companies make decisions. This has made it important to analyze this data to make accurate predictions of stock prices in the future. In this paper, we try to analyze financial discussions among users of Reddit by extracting hidden patterns, themes and user characters to predict future actions and consequences on the market.

Sarcasm Detection of Media Text Using Deep Neural Networks

ICACNI 2019 - Springer, Singapore
Nov 2020

Sarcasm detection in media text is a binary classification task where text can be either written straightly or sarcastically (with irony) where the intended meaning is the opposite of what is seemingly expressed. Performing sarcasm detection can be very useful in improving the performance of sentimental analysis, where existing models fail to identify sarcasm at all. We examine the use of deep neural networks in this paper to detect the sarcasm in social media text(specifically Twitter data) as well as news headlines and compare the results. Results show that deep neural networks with the inclusion of word embeddings, bidirectional LSTM’s and convolutional networks achieve better accuracy of around 88 percent for sarcasm detection.

News Background Linking Using Document Similarity Techniques

ICACNI 2019 - Springer, Singapore
Nov 2020

Newsreaders still look forward to well-established news sources for a more accurate and detailed analysis of the news or articles. In this paper, we present a model that can retrieve other news articles that provide essential context and background information related to the news article that is being looked into. The abstract topics have been recognized using Latent Dirichlet Allocation. A similarity matrix has been created after the extraction of topics using two methods, cosine similarity and Jensen–Shannon divergence. This paper describes the implementation and gives a brief description of both the similarity methods. It also gives a comparative study of both methods.

Hackathons

WeHack II

The University of Texas at Dallas
(2022)

HackUTD IX

The University of Texas at Dallas
(2022)

HackUTD VIII

The University of Texas at Dallas
(2021)

HackUTD VII

The University of Texas at Dallas
(2020)

Hack in the North

IIIT Allahabad
(2019)

Smart India Hackathon

Govt. of India
(2019)

Hack36

NIT Allahabad
(2019)

iHack

IIT Bombay
(2019)

Education

Master of Science in Computer Science

The University of Texas at Dallas

3.758/4.0
Aug 2020 - May 2022

  • Database Design
  • Design and Analysis of Algorithms
  • Object Oriented Analysis and Design

  • Artificial Intelligence
  • Machine Learning
  • Data Representations
  • Natural Language Processing
  • Convolutional Neural Networks

  • Web Programming Languages
  • Big Data Management and Analytics
  • Cloud Computing

Bachelor of Technology in Computer Science

Indian Institute of Information Technology Kalyani

9.17/10.0
Aug 2016 - May 2020

  • Machine Learning
  • Natural Language Processing
  • Artificial Intelligence
  • Data Analytics and Optimization
  • Database Management Systems

  • Data Structures and Algorithms
  • Discrete Mathematics
  • Python and Numerical Analysis

  • Compiler Design
  • Computer Networks
  • Computer Organization and Architecture
  • Operating Systems

Projects

💲 STONKS : Analyzing Financial Discussions on Reddit

Jun 2021 - Apr 2022

Social media has become an important part of digital web life, the effect it has on financial decisions is also increasing. The social media conversations on websites like Twitter, Reddit and Facebook are having an ever-increasing effect on stock prices as well as the way in which companies make decisions. This has made it important to analyze this data to make accurate predictions of stock prices in the future. In this paper, we try to analyze financial discussions among users of Reddit by extracting hidden patterns, themes and user characters ​to predict future actions and consequences on the market.

We explore techniques based on natural language processing to pass conversations through data pipeline along with extracting stock tickers, manual as well as automatic theme extraction and word clouds. We also discuss common text processing techniques which can also be applied to other problems involving text analysis along with correlation model focusing on sentiment analysis as a predictor of stock movement.

🤝 This is what our meetings say about us!

Jun 2021 - Apr 2022

The pandemic moved us to virtual world and video meetings changed the way to interact with each other. The O.A.S.E.S (Our Art Science Experimental Seminars), previously called Watering Hole's conducted by ArtSciLab also moved to hybrid format, and it was a great opportunity to analyze the meetings using recordings from a data perspective. This white paper summarizes the step-by-step procedure to visualize these seminars, right from collecting data to creating visualizations.

We explore techniques based on natural language processing to pass conversations through a data pipeline, along with creating word cloud visualizations and analyzing sentiment of the text. We also discuss the ways to extend this idea to create time-varying word clouds and video recordings of the meetings.

💦 Real-Time Water Flow Operation Optimization

Nov 2021 - Nov 2021

As a part of HackUTD sponsored challenge by EOG Resources, problem involved fetching total water, dividing into multiple operations and increment revenue as much as possible.

EOG wells are constantly producing water and we worked on finding ways to reduce expenditures, as well as reuse and recycle water whenever possible. The challenge was to develop an application which can process a steady stream of real-time sensor data to both optimize and visualize the distribution of water for company's upstream operations.

ऋ Language Identification using Ensembled Deep Neural Networks

Jun 2021 - Mar 2021

Language identification (LI) is a multi-class classification problem where the task is to determine the natural language of the given text. Various other natural language processing tasks assume that the language of the text is known. Language identification enables automatically detecting the source language in tools like Google Translate before any further translation. We examine the use of an ensemble of deep neural networks for this task of language identification in a monolingual benchmark dataset. Results show that deep neural networks with the help of an integrated ensemble of bidirectional LSTMs with varying feature extraction techniques achieve better accuracy of around 98 percent for language identification.

🏠 Roomies - Find Roomies Tinder Style

Feb 2021 - Feb 2021

It's not easy to find the perfect room and roommates when you go 1000 miles away from your home for international study. Roomies is a platform to help you automatically find roommates according to habits and preferences in an easy to use tinder-like style!

Developed as a part of HackUTD 2021 submission, the user is asked questions about gender, Age group, Country, University, Starting semester, Course, Food preferences, Smoking and Drinking habits, Cooking experience and expectations is similar areas from their roommates. Then a matching algorithm is used to calculate scores and top results are recommended to the user.

🧑Person Re-identification using Deep Learning

Jul 2019 - May 2020

Person Re-identification (Re-ID) is a fundamental subject in the field of computer vision technologies due to its application and research significance. The purpose of this task is to spot an individual in other cameras. This work explores conventional Re-ID approaches and then focuses on the latest developments, including methods and techniques using deep learning. In this work, we propose an efficient approach in which we first eliminate dissimilar images using k-mediods clustering. A silhouette part-based feature extraction scheme is adopted in each level of hierarchy to preserve the relative locations of the different body structures and make the appearance descriptors more discriminating in nature.

📰 News Recommendations using Background Linking

Jan 2019 - Dec 2019

Developed Cosine similarity and Jensen–Shannon divergence based news recommendation approach on 'TREC News Dataset' using background information linking, mentored by Prof. Tirthankar Dasgupta (TCS).

Newsreaders still look forward to well-established news sources for a more accurate and detailed analysis of the news or articles. In this paper, we present a model that can retrieve other news articles that provide essential context and background information related to the news article that is being looked into. The abstract topics have been recognized using Latent Dirichlet Allocation. A similarity matrix has been created after the extraction of topics using two methods, cosine similarity and Jensen–Shannon divergence. This paper describes the implementation and gives a brief description of both the similarity methods. It also gives a comparative study of both methods.

💬 Sarcasm Detection of Media Text Using Deep Neural Networks

Jul 2018 - May 2019

Sarcasm detection in media text is a binary classification task where text can be either written straightly or sarcastically (with irony) where the intended meaning is the opposite of what is seemingly expressed. Performing sarcasm detection can be very useful in improving the performance of sentimental analysis where existing models fail to identify sarcasm at all. We examine the use of deep neural networks in this paper to detect the sarcasm in social media text(specifically Twitter data) as well as news headlines and compare the results. Results show that deep neural networks with the inclusion of word embeddings, bidirectional LSTM’s and convolutional networks achieve better accuracy of around 88 percent for sarcasm detection.

🧠 Brain Image Segmentation and Tumor Detection

Jan 2019 - Mar 2019

Finalist at Smart India Hackathon - 2019

Medical Datasets: BraTS 2018 & MrBrains18

Models: Modified VGG16 & Modified UNetCNN

Interface: GUI using Python PyQt with the ability to visualize the results along with accuracy in real-time & the website made using Django that works the same as GUI but with easy and better visualization

Results: Brain Image Segmentation in the 4D dataset with average dice score of 0.84 and Tumor classification with average dice score of 0.96

🔳 RGBD Object Detection

Jan 2019 - Jan 2019

Designed as part of iHack 2019, IIT Bombay, it's a Fast-RCNN and YOLOv3 based object detection model for RGBD images. The model is able to detect occlusions in the RGBD images and also detect objects in the images and videos.

This solution helps in solving problems like better representation of crowded images in real-life situations like crowd controlling and traffic control, use of depth information for simultaneous image detection in noisy images and better side view prediction probability.

👪 FamilyWhere - A Deep Learning based solution for finding children and people lost in public gatherings

Jan 2019 - Jan 2019

Workflow: 1. When anyone realizes that their children or loved ones are lost in the crowd, he/ she will upload a single photo of who lost along with contact information of self. 2. The model will store the image into the database with proper encoding. 3. Whenever any other person will find that child or a person, he will just click a photo in the app and most probably 3 matches of the child will be shown along with contact numbers.

USP: 1. Easy to use interface with simple 2-step user actions 2. User just needs to upload one image of the child for face recognition instead of few hundreds required by other deep learning based approaches 3. Great accuracy of the model to identify the child or a person correctly.

🗞 RealU - Unbiased Untampered News and Media Experience

Mar 2018 - Apr 2018

Developed as a part of Hack In The North 2019 submission, RealU is a deep learning based solution that helps you detect fake media and stay away from them. Along with news text, it also analyses the images related to news using Google's cloud APIs to give user the best possible prediction about news being fake or tampered.

👨‍🎓 Automatic Class Attendance System

Jan 2017 - Nov 2017

This is an image-processing based attendance system, coded entirely in Python.

The project is divided into 3 parts: 1. Registration: We collect data and images of the students. 2. Attendance Marking: We record a video, while the class is in session. Then, we extract some frames from the video and extract faces from it. Finally, we match the extracted faces to those registered and mark the attendance. 3. Displaying Attendance: Basic DBMS queries and displaying the result in a properly formatted way.

We used PyQt for GUI, OpenCV & face_recognize modules for detecting and matching faces.

Contact

+1 469 649 3928

omkarajnadkar@gmail.com

Dallas, TX, USA