Technologist Intern
Jun 2019 — Aug 2019
• Developed topic modeler which used Python and the SeededLDA algorithm and automatically tagged the articles to different topics which ensured a wider reach of articles to the audience, and reduced human effort by 50%.
• Designed report for content writers to know about the number of articles each topic hold, which ensures content writers to focus on underwritten topics.
Software Engineer
Nov 2015 — Sep 2018
• Successfully built 7 migration projects for International clients from banking as well as the energy and resource sector.
• Migrated close to 70 files from diverse sources such as Salesforce, Databases, XML, EBCDIC, and ASCII to HDFS reducing storage cost by 45% by creating an Enterprise data warehouse.
• Worked with the downstream user team to extract data from Hive and Impala, performed exploratory data analysis, and created reports about customer preference on Tableau.
• Designed migration pipelines for migrating data from RDBMS to HDFS using SQOOP ensuring dependency on RDBMS is reduced in turn reducing the cost by 40%.
• Automated multi-source data extraction process using shell scripts, Autosys tool, improving the time efficiency by 50%.
• Led and mentored a team of 3 associates to understand the project and the work on migration of XML file to HDFS.