About me
Hello there! I'm Orkhan, Microsoft Certified Power BI Analyst with a focus on developing insightful dashboards and reports that facilitate data-driven decision making. I contributed to the success of my previous teams at Kapital Bank and ABB by building dynamic, automated dashboards and finding hidden insights in the dataset through extensive data analysis. I am ready to contribute to the success of my next team!
Skills
Programming Languages and BI Tools: Python | R | Visual Studio | SQL Server | Oracle SQL | Databricks | MS Office Suite | Shell | Git | Bash | GitHub | Tableau | Power BI | BigQuery | Azure | Jira | Looker
Frameworks: Pandas | Numpy | Matplotlib | Seaborn | Scikit - learn |
Data Science: Linear Regression | Logistic Regression| A/B Testing | Association | Classification model | Dimensionality Reduction | KNN
Professional Certifications
Featured Projects
Python | SQL | Tableau
Sales Revenue Analysis
In this project, I analyzed and explored retail transactions of US-based home and office supply company.The questions that I tried to find answers through the analysis of the data:
1) How many product categories does the company have and which product sub-categories bring in the most revenue?
2) Which segment of customers does the company sell most of its products to?
3) What is the most preferred method of shipment for its customers when placing an order?
4) How is the business doing across the states profit-wise? What are the states that the business is doing well and the ones that it needs to improve the performance?The steps that I took for analyzing the data:
1) Imported the data from Kaggle directly into Python using its Kaggle library.
2) Performed exploratory data analysis and data cleaning using pandas and numpy libraries of Python.
3) Created an empty table namde df_orders on SQL Server and loaded the dataset from Python directly into that table using SQLAlchemy library of Python.
4) Analyzed the data on SQL Server to find insights about the data using window functions.
5) Connected the SQL Server to Tableau desktop for fetching the data, and developed the dashboard before publishing it to Tableau Public Server.My key takeaways from the analysis:
1) The company needs to increase its efforts in mid states such as Nebraska and South Dakota as the profit in mid states are considerably lower than other states like California, Texas and New York.
2) In terms of customer segment, the company needs to improve the number of sales made to corporate segment since it has a great potential in terms of boosting the profit as corporate customers usually buy office products in bulk meaning more profit.
3) The company might consider investing in first and second class ship modes to increase customer satisfaction. Receiving deliveries on or ahead of time will likely translate into customer loyalty and improved satisfaction that are key for steady profit.The dashboard is fully dynamic and can be set for scheduled refreshes as it is directly connected to data source. Please feel free to check out the dashboard and the codes by clicking on Sales Revenue Tableau Dashboard and Sales Revenue Python and SQL Coding buttons below.
Python | tableau
US House sales price project
The main goal I tried to achieve in this project was to clean the data that had many missing values and treat the outliers before visualizing it on Tableau Desktop for further insights.Through the analysis of this dataset, I tried to come up with the answers for the following questions:
1) What was the mean house price and how did it fluctuate throughout the years spanning from 2000 until 2022?
2) How was the mean house price distributed across the states? Which states have the highest mean house prices in the US?
3) How is the relationship defined between average house size and average house price across the 50 states?
4) What is the ratio of the listings that were sold to the ones that were for sales?The steps that I took for analyzing the data:
1) Imported the data into Python's Jupyter Notebook using pandas library.
2) Performed EDA, handled missing values using mean, mode, and median imputation where relevant, and treated outliers using IQR method.
3) Created an empty table named ushousenew on SQL Server and populated the empty table with the dataset from Python using SQLAlchemy library.
5) Connected the SQL Server to Tableau Desktop for fetching the data, and developed the dashboard before publishing it to Tableau Public Server.My key takeaways from the analysis:
1) US states differ significantly in terms of mean house price. There are states with mean house price as low as $215K, and as high as $610K. The higher mean house prices can be attributed to the favorable geographical location and the development of specific industries that bring in money.
2) There are significant differences in states as well when it comes to the relationship between mean house size and mean house price. For instance, while it costs $610K to purchase a 1,619 square feet house in California, the similar size house in Michigan costs around $230K, or nearly 2.6 times less expensive.
3) Another noticeable insight is in the fluctuations in the average house prices between the specified period. In the period between 2008- 2009, price is set to increase dramatically, which can be attributed to the Great Recession that affected US economy adversely. Another noticeable shift took place between 2020-2021 period, where mean house price actually dropped which can be explained with the decreasing interest in purchasing house due to effect of the pandemic.The dashboard is fully dynamic and can be set for scheduled refreshes as it is directly connected to data source. Please feel free to check out the dashboard and the codes by clicking on US House Sales Price Tableau Dashboard and US House Sales Price Python Coding buttons below.
Python
Restaurant Dataset analysis
My main goal in carrying out the statistical analysis of restaurant dataset was to leverage Python's powerful statistics packages to uncover relationships between pairs of variables using Chi-Square test, one-sample and two-sample T-tests.
The questions that I tried to find answers through the analysis of the data:
1) Is there any positive or negative correlation between numerical pair of variables?
2) Are there any two numerical variables that demonstrate strong linear relationship?
3) Is there any statistical association between meal's temperature and meal serve type?
The steps that I took for analyzing the data:
1) Read 7 CSV files into Jupyter Notebook and merged 6 of them into a single dataframe using merge method.
2) Plotted histograms' of missing variables to see their distributions before imputing their missing values using mean imputation.
3) Applied Pearson's correlation matrix and created scatter plots to find out correlations between numerical variables.
4) Explored potential linear relationship between commission and monthly budget using linear regression model.
5) Applied Chi-Square test on hot_cold and serve type variables to find out the relationship.
6) Carried out one-sample and two-sample t-tests on pair of numerical variables.
Key results of the statistical analysis:
1) According to Pearson's correlation matrix and scatter plots, variable balance had strong negative correlation with commission, meal count, monthly budget, and order count variables.
2) Variable pair of monthly budget and commission demonstrated a strong positive linear relationship with the slope of 0.08 and intercept of -0.17, meaning for each unit increase in monthly budget, commission amount will grow by 0.08.
3) According to the result of Chi-Square test, there is not enough evidence to conclude the association between serve type and meal temperature.Please feel free to check out the Python coding by clicking on Statistical Analysis Python Coding button below.