Data Analysis

 

Category 1: Data Cleaning & Exploratory Data Analysis (EDA)

 

1. Comprehensive Data Quality Report Generator
“Act as a senior data analyst. I have a dataset loaded into a pandas DataFrame named df. The dataset contains customer transaction data with the following columns: [column_names]. Generate a Python script that performs a comprehensive data quality assessment. The script must:

1. Profile each column (data type, count, nulls, unique values).

2. Identify and quantify missing values for each column.

3. Detect potential outliers in numerical columns using the IQR method.

4. Check for duplicate rows.

5. Summarize the findings in a markdown report, including specific recommendations for cleaning or imputation for each identified issue.”

 

2. Strategic Anomaly Detection Plan
“I am analyzing a time-series dataset of website traffic with columns timestamp, page_views, session_duration, and bounce_rate. I suspect there are unusual patterns related to marketing campaigns and technical glitches. Propose three distinct statistical or machine learning-based methods for anomaly detection (e.g., Isolation Forest, Seasonal-Trend decomposition, Z-score on rolling averages). For each method, explain its underlying logic, its pros and cons in this context, and provide the boilerplate Python code to implement it.”

 

3. Advanced Feature Engineering Brainstorm
“I am building a model to predict customer churn. My dataset includes customer_id, join_date, last_login_date, total_purchases, average_order_value, and customer_service_tickets. Act as a data scientist and brainstorm a list of 10-15 powerful engineered features from this raw data. For each feature, provide the name, the logic for its creation (e.g., ‘Recency: days since last login’), its potential predictive value, and the pandas code to generate it.”

 

4. Hypothesis-Driven EDA Roadmap
“Our business goal is to increase user engagement on our mobile app. The key metrics in my dataset are daily_active_users, avg_session_length, feature_adoption_rate, and user_demographics. Generate a structured Exploratory Data Analysis (EDA) plan. The plan should be framed around 3 key business hypotheses. For each hypothesis, list the specific questions to ask, the data visualizations to create (e.g., scatter plot of session length vs. feature adoption), and the statistical tests to perform.”

 

5. Automated Cohort Analysis Script
“Generate a Python script to perform a user retention cohort analysis. Assume I have a dataset in a pandas DataFrame called df with customer_id, order_id, and order_date. The script should:

1. Assign each customer to a monthly acquisition cohort based on their first purchase date.

2. Calculate the number of unique customers from each cohort who returned in subsequent months.

3. Create a retention matrix (cohort table) showing the percentage of active users over time.

4. Use Seaborn to generate a heatmap visualization of the retention table.”

 

Category 2: Statistical Analysis & Modeling

 

6. Choosing the Right Statistical Test
“Act as a statistics expert. I need to compare two groups, A and B. Create a decision tree in markdown format that guides me to the correct statistical test. The decision tree should ask questions about:

– The type of data (continuous, categorical, binary).

– The distribution of the data (normal vs. non-normal).

– The relationship between the samples (independent vs. paired).

The final nodes of the tree should recommend the appropriate test (e.g., Independent T-test, Mann-Whitney U, Paired T-test, Chi-Squared, etc.) and state the null hypothesis ($H_0$) for each.”

 

7. Interpreting a Predictive Model for Stakeholders
“I have trained a Gradient Boosting classifier to predict which customers are likely to churn. The model has an accuracy of 88% and an AUC of 0.82. The top 5 features are contract_type, monthly_charges, tenure, internet_service_type, and total_charges. Write a concise, non-technical executive summary explaining what this model does, what the performance metrics mean in simple business terms, and the actionable insights derived from the most important features. Focus on the ‘so what?’ for a business leader.”

 

8. A/B Test Power Analysis Script
“I am planning an A/B test to see if a new button color increases the click-through rate (CTR). The current baseline CTR is 2.5%. I want to be able to detect a minimum uplift of 10% (i.e., a new CTR of 2.75%). Generate a Python script using statsmodels to calculate the required sample size for each group to achieve a statistical power of 80% with a significance level (α) of 0.05. Explain the impact on sample size if we wanted to detect a smaller uplift.”

 

9. Root Cause Analysis Framework Application
“Our sales in the ‘Northeast’ region dropped by 15% last quarter. My dataset contains date, region, sales, ad_spend, competitor_promotions_active, and sales_rep_id. Apply a structured root cause analysis framework (like the ‘5 Whys’ or a fishbone diagram structure) to this problem. Generate a list of pointed questions and data queries (in pseudo-SQL) that would systematically investigate potential causes across different domains like marketing, operations, competition, and personnel.”

 

10. Time-Series Forecasting Model Comparison
“I need to forecast monthly sales for the next 6 months. My data has two columns: Month and Sales. Generate Python code that builds and evaluates two different forecasting models:

1. A classical statistical model (e.g., SARIMA).

2. A machine learning model (e.g., Prophet by Meta).

The code should include model training, forecasting, and a comparison of the results using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).”

 

Category 3: Data Visualization & Storytelling

 

11. Dashboard Design Blueprint
“I need to design a Power BI / Tableau dashboard for the marketing team to track campaign performance. The primary goal is to assess ROI and lead generation. Based on this, create a blueprint for the dashboard. Specify:

– The Key Performance Indicators (KPIs) to display prominently (e.g., Cost Per Lead, Conversion Rate).

– The types of charts to use for different data (e.g., line chart for trends, bar chart for channel comparison, map for regional performance).

– The interactive filters the user should have (e.g., Date Range, Campaign Name, Channel).

– The overall layout and story (e.g., top-left for high-level summary, drill-down details on the right).”

 

12. Crafting a Data-Driven Narrative Arc
“I have discovered that customer satisfaction scores are highly correlated with our support team’s response time. My analysis shows that for every 1-hour reduction in response time, CSAT scores increase by an average of 5%. Write a compelling data story based on this finding. Structure it with a clear narrative arc:

1. The Hook: Start with the problem (stagnant CSAT).

2. The Rising Action: Describe the investigation and the ‘aha!’ moment in the data.

3. The Climax: Present the key finding with the powerful statistic.

4. The Falling Action: Explain the ‘why’ behind the finding.

5. The Resolution: Propose a clear, data-backed recommendation with a quantified expected business impact.”

 

13. Generating Advanced Plotly Visualizations
“Generate a Python script using the Plotly Express library to create an interactive scatter plot. The plot should analyze the relationship between average_income, education_level, and product_spend from a customer dataset. The plot must:

– Use average_income for the x-axis and product_spend for the y-axis.

– Color the points by education_level.

– Size the points by number_of_household_members.

– Include hover-text that shows the customer_id.”

 

14. Anticipating and Answering Stakeholder Questions
“I am about to present my analysis which recommends shifting our marketing budget from Channel X to Channel Y. My analysis is sound, but I expect pushback from the team that manages Channel X. Act as a skeptical but fair business director. Generate a list of 5-7 challenging questions they might ask about my analysis. For each question, provide a concise, data-driven, and diplomatic response.”

 

15. Simplifying a Complex Metric
“Explain the concept of ‘Customer Lifetime Value (CLV)’ to a non-technical sales team. Use an analogy to make the definition intuitive. Break down how it’s calculated in simple terms (without complex formulas) and, most importantly, explain three specific actions they can take in their daily work to directly influence and increase CLV.”

 

Category 4: SQL & Database Operations

 

16. Advanced SQL Window Function Query
“I have a table named sales with columns sale_id, product_id, sale_date, and sale_amount. Write a SQL query that, for each product, calculates the month-over-month sales growth percentage. Use window functions (like LAG) to access the previous month’s sales data. The output should have three columns: product_id, sales_month, and mom_growth_pct.”

 

17. SQL Query Optimization Advice
“I have the following slow-running SQL query: [Paste your slow query here]. The tables are [table_names_and_brief_schema]. Act as a database administrator (DBA) and analyze this query for performance bottlenecks. Provide a list of actionable recommendations to optimize it, such as:

– Adding specific indexes.

– Rewriting a subquery as a Common Table Expression (CTE) or JOIN.

– Re-ordering the WHERE clause conditions.

– Using EXPLAIN to analyze the query plan.”

 

18. Data Mart Schema Design
“I’m tasked with creating a data mart for sales reporting. The source data comes from tables for customers, products, orders, and order_details. Design a simple star schema for this data mart. Describe the central fact table (e.g., fact_sales) and the surrounding dimension tables (e.g., dim_customer, dim_product, dim_date). List the columns that should be in each table, clearly identifying primary and foreign keys.”

 

Category 5: Workflow, Strategy & Professional Growth

 

19. Building a Reusable Analysis Template
“Create a markdown template for a standard data analysis project. The template should be structured like a Jupyter Notebook with clear sections and prompts for the analyst to fill in. Include sections for:

1. Problem Definition & Hypotheses

2. Data Import & Initial Inspection

3. Data Cleaning & Preprocessing

4. Exploratory Data Analysis & Visualization

5. Key Findings & Insights

6. Recommendations & Next Steps

7. Code Appendix”

 

20. Translating a Business Request into a Technical Plan
“My manager said: ‘I want to understand what drives customer loyalty. Can you look into the data?’ Translate this vague business request into a structured analytical plan. The plan should include:

1. A clear, measurable definition of ‘loyalty’ (e.g., repeat purchase rate, days since last purchase).

2. A list of data sources to investigate.

3. The main analytical methods you will use (e.g., regression analysis, customer segmentation).

4. The expected deliverables (e.g., a report, a dashboard).”

 

21. Prioritizing Analytical Requests
“I have a backlog of 5 data analysis requests from different departments. Act as a senior analyst and create a prioritization matrix to decide which project to tackle first. The matrix criteria should be Business Impact (scale of 1-5) and Effort/Complexity (scale of 1-5). Describe how you would score each request and how you would communicate your decision-making process to stakeholders.”

 

22. Python Script Refactoring for Efficiency
“I have a Python script for data processing that is slow and hard to read. It uses multiple for loops to iterate over a large DataFrame. Act as a senior Python developer and provide general principles and specific code examples for refactoring it. Focus on:

– Vectorization using pandas and NumPy to replace loops.

– Creating reusable functions to reduce code duplication.

– Adding comments and using descriptive variable names for readability.”

 

23. Crafting a Personal Development Plan
“I am a Data Analyst with 2 years of experience, proficient in SQL and Tableau. I want to transition towards a Data Scientist role in the next 18 months. Create a personalized learning plan for me. The plan should recommend:

– Key concepts to learn (e.g., advanced statistics, specific ML algorithms).

– Technical skills to acquire (e.g., Python libraries like Scikit-learn, TensorFlow).

– Project ideas for a portfolio to demonstrate these new skills.

– Resources for learning (e.g., specific online courses, books, blogs).”

 

24. Quantifying the ROI of an Analysis Project
“My recent analysis identified an inefficient marketing campaign, and my recommendation led to reallocating $50,000 of the budget, which is projected to increase lead generation by 10%. Help me write a short paragraph for my performance review that quantifies the business impact of this project. Use the ‘Situation-Action-Result’ framework and focus on translating analytical work into tangible business value.”

 

25. Ethical Data Handling Checklist
“I am working with a customer dataset that contains personally identifiable information (PII) like names and emails, as well as sensitive demographic data. Create a short, actionable checklist for ensuring ethical and compliant data handling throughout my analysis project. The checklist should cover data anonymization, bias detection, secure storage, and principles for interpreting and presenting results responsibly.”