Yash Gaikwad
Portfolio

  • Data Analyst with an Engineer’s precision and a Strategist’s mindset
  • SQL | Python | Power BI | Tableau
  • Mulesoft Certified Developer
Continue

Data Cleaning On Company Layoffs Dataset

(SQL)

Faced with a messy and inconsistent layoffs dataset, I replicated the raw table into a staging environment to preserve source integrity. I cleaned the data by removing exact duplicates, standardized company and country names, corrected date formats, and handled nulls by propagating values based on related rows. Through SQL queries and transformations, I normalized the dataset, prepared it for exploratory analysis, and ensured it was free from structural errors—establishing a clean foundation for further insights.

Exploratory Data Analysis on Layoffs Dataset

(SQL)

To analyze the global wave of company layoffs, I performed comprehensive exploratory data analysis on a real-world dataset using SQL. I handled missing values, standardized fields, and created new columns like total employee count using derived logic. I then built multiple queries to uncover trends by industry, company size, and layoff stage, including yearly breakdowns and rolling monthly totals. This project highlighted my ability to turn raw HR data into strategic insights around workforce reduction patterns.

AI Research Agent

(Python, OpenAI)

I designed and implemented an AI-powered research automation agent that combines large language models with DuckDuckGo and Wikipedia to deliver structured, source-backed topic summaries. By integrating prompt engineering with schema validation using Pydantic models, I built a system that consistently produced accurate and reliable outputs across a wide range of queries. This project demonstrates how LLMs can be transformed from unstructured responders into dependable tools for knowledge retrieval, real-time decision support, and extensible research frameworks adaptable to multiple domains.

Sales Forecasting with ML

(Python, Linear Regression)

I developed a machine learning solution to forecast retail sales using over 900,000 historical records. The project involved transforming raw transactional data into a supervised learning dataset with lag features, applying stationarity techniques, and training a Linear Regression model that delivered highly precise monthly demand predictions. By visualizing the forecasts against actual sales, I uncovered seasonal patterns and post-peak slowdowns, creating a forecasting framework that highlights actionable insights for inventory planning and financial modeling in supply chain and retail environments.

Movies Data Analysis

(Python)

To understand what makes a movie successful, I conducted an in-depth analysis of a dataset covering thousands of films using Python. I cleaned, transformed, and visualized data with Pandas, Seaborn, and Matplotlib—exploring relationships between ratings, genres, budgets, and box office revenue. Through correlation analysis and custom visuals, I identified key drivers behind highly rated or financially successful movies. The results provided data-driven insights valuable for both production planning and streaming platform recommendations.

Automating Crypto Data

(Python)

Staying updated with real-time crypto metrics manually is inefficient & prone to error. I aimed to automate the collection of up-to-date crypto data from an API for analysis & reporting. I built a Python script that connects to a crypto API, extracts relevant data, transforms the JSON response using Pandas, and cleans it into an analyzable format. The data is then sorted, filtered, & exported to CSV. The solution enabled automated, real-time data pulls with zero manual effort, creating a reliable pipeline for market monitoring, trend analysis, & dashboard integration.