Data Analyst – MARC FILIAS

Here’s a project idea that you can publish on your website to showcase your skills as a Quantitative Analyst:

Project: “Stock Market Analysis and Prediction using Machine Learning”

Objective: To analyze the historical stock prices of a selected company and build a predictive model to forecast future stock prices using machine learning techniques.

Dataset: Choose a publicly traded company (e.g., Apple, Google, Amazon) and collect its historical stock price data from a reliable source such as Yahoo Finance or Quandl. You can also use a dataset from Kaggle or UCI Machine Learning Repository.

Tasks:

Data Cleaning and Preprocessing: Clean and preprocess the data by handling missing values, converting date formats, and normalizing the data.
Exploratory Data Analysis (EDA): Perform EDA to understand the distribution of stock prices, identify trends, and visualize the data using plots and charts.
Feature Engineering: Extract relevant features from the data that can be used to build a predictive model, such as moving averages, exponential moving averages, and technical indicators (e.g., RSI, Bollinger Bands).
Model Building: Build a predictive model using a machine learning algorithm (e.g., Linear Regression, Decision Trees, Random Forest, LSTM) to forecast future stock prices.
Model Evaluation: Evaluate the performance of the model using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Percentage Error (RMSPE).
Hyperparameter Tuning: Perform hyperparameter tuning to optimize the model’s performance.
Backtesting: Backtest the model using historical data to evaluate its performance over time.

Deliverables:

A written report ( approx. 2-3 pages) that includes:
- Introduction to the project and objectives
- Data description and EDA findings
- Model building and evaluation results
- Hyperparameter tuning and backtesting results
- Conclusion and recommendations
A Python code repository (e.g., GitHub) that includes:
- Data cleaning and preprocessing code
- EDA code
- Feature engineering code
- Model building and evaluation code
- Hyperparameter tuning and backtesting code
Visualizations (e.g., plots, charts, tables) that illustrate the findings and results.

Tips:

Use a clear and concise writing style in your report.
Use visualization libraries such as Matplotlib, Seaborn, or Plotly to create informative and engaging plots.
Use a version control system like Git to manage your code repository.
Consider using a Jupyter Notebook to document your code and results.

Skills demonstrated:

Data analysis and visualization
Machine learning model building and evaluation
Feature engineering and selection
Hyperparameter tuning and backtesting
Programming skills in Python
Data preprocessing and cleaning

By publishing this project on your website, you’ll be able to showcase your skills as a Quantitative Analyst and demonstrate your ability to analyze and predict stock prices using machine learning techniques.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Load data
df = pd.read_csv('AAPL.csv')

# Convert date column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Set date as index
df.set_index('Date', inplace=True)

# Plot stock prices
plt.figure(figsize=(10,6))
plt.plot(df['Close'])
plt.title('AAPL Stock Prices')
plt.xlabel('Date')
plt.ylabel('Close Price')
plt.show()

# Calculate moving averages
df['MA_50'] = df['Close'].rolling(window=50).mean()
df['MA_200'] = df['Close'].rolling(window=200).mean()

# Calculate exponential moving averages
df['EMA_50'] = df['Close'].ewm(span=50, adjust=False).mean()
df['EMA_200'] = df['Close'].ewm(span=200, adjust=False).mean()

# Drop rows with NaN values
df.dropna(inplace=True)

# Plot moving averages
plt.figure(figsize=(10,6))
plt.plot(df['Close'], label='Close Price')
plt.plot(df['MA_50'], label='MA 50')
plt.plot(df['MA_200'], label='MA 200')
plt.plot(df['EMA_50'], label='EMA 50')
plt.plot(df['EMA_200'], label='EMA 200')
plt.title('AAPL Stock Prices with Moving Averages')
plt.xlabel('Date')
plt.ylabel('Close Price')
plt.legend()
plt.show()

# Prepare data for modeling
X = df[['MA_50', 'MA_200', 'EMA_50', 'EMA_200']]
y = df['Close']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Linear Regression model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
lr_y_pred = lr_model.predict(X_test)

# Random Forest model
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
rf_y_pred = rf_model.predict(X_test)

# LSTM model
X_train_lstm = np.reshape(X_train.values, (X_train.shape[0], X_train.shape[1], 1))
X_test_lstm = np.reshape(X_test.values, (X_test.shape[0], X_test.shape[1], 1))

lstm_model = Sequential()
lstm_model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train_lstm.shape[1], 1)))
lstm_model.add(LSTM(units=50))
lstm_model.add(Dense(1))
lstm_model.compile(loss='mean_squared_error', optimizer='adam')
lstm_model.fit(X_train_lstm, y_train, epochs=500, batch_size=32)
lstm_y_pred = lstm_model.predict(X_test_lstm)

# Evaluate models
lr_mae = mean_absolute_error(y_test, lr_y_pred)
lr_mse = mean_squared_error(y_test, lr_y_pred)
rf_mae = mean_absolute_error(y_test, rf_y_pred)
rf_mse = mean_squared_error(y_test, rf_y_pred)
lstm_mae = mean_absolute_error(y_test, lstm_y_pred)
lstm_mse = mean_squared_error(y_test, lstm_y_pred)

print('Linear Regression MAE:', lr_mae)
print('Linear Regression MSE:', lr_mse)
print('Random Forest MAE:', rf_mae)
print('Random Forest MSE:', rf_mse)
print('LSTM MAE:', lstm_mae)
print('LSTM MSE:', lstm_mse)

# Plot predicted values
plt.figure(figsize=(10,6))
plt.plot(y_test.values, label='Actual Values')
plt.plot(lr_y_pred, label='Linear Regression Predictions')
plt.plot(rf_y_pred, label='Random Forest Predictions')
plt.plot(lstm_y_pred, label='LSTM Predictions')
plt.title('AAPL Stock Prices Predictions')
plt.xlabel('Date')
plt.ylabel('Close Price')
plt.legend()
plt.show()