How to Create and Deploy a Streamlit App on AWS for Data Science Projects
In this tutorial, we will dive into the creation and deployment of a Streamlit app for data science projects, using Iris classification as an example. Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. Before we jump into coding, let’s understand some key Streamlit components we will be using: sidebar, subheader, button, slider, caching, and session state.
But, before that, if you wish to get an idea of how a production ready Data Science project and code looks like, you can check out the below repository:
https://github.com/kshitijkutumbe/usa-visa-approval-prediction
Streamlit Components
1. Sidebar
The sidebar in Streamlit is a useful component for adding controls like sliders, buttons, or inputs. It’s a way to add interactivity to your app without cluttering the main page layout.
2. Subheader
Subheaders in Streamlit are used to organize content and make your app more readable. It’s a way to guide the user through the app’s flow and highlight different sections.
3. Button
Buttons in Streamlit can trigger actions. For example, after setting all your input parameters, a button could be used to start model inference.
4. Slider
Sliders provide a user-friendly way to input numerical values. In our case, this could be used for adjusting hyperparameters of the machine learning model.
5. Caching
Caching in Streamlit helps speed up your app by storing the output of expensive computational tasks. This is especially useful in data science projects where you might not want to retrain a model every time the app is used.
6. Session State
Session state in Streamlit helps in maintaining the state of the app. This is particularly useful for keeping track of user inputs or states across different interactions.
Building the Streamlit App
Iris Classification Example
We will create a simple app for classifying Iris species using a machine learning model. The app will allow users to input features of an Iris flower and receive a prediction of the species.
Prerequisites
- Python installed on your system.
- Basic understanding of Python programming.
- Familiarity with machine learning concepts.
Step-by-Step Guide
1. Setting Up Your Environment
First, ensure that you have Streamlit installed. You can install it via pip:
pip install streamlit
2. Writing the Streamlit App
Now, let’s start writing our app. Create a new Python file named iris_app.py
and add the following code:
import streamlit as st
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Function to train and cache the model
@st.cache
def train_model():
clf = RandomForestClassifier()
clf.fit(X, y)
return clf
# Function to classify Iris species
def classify_iris(model, sepal_length, sepal_width, petal_length, petal_width):
prediction = model.predict([[sepal_length, sepal_width, petal_length, petal_width]])
return iris.target_names[prediction][0]
def main():
st.title("Iris Species Classifier")
# Initializing session state
if 'sepal_length' not in st.session_state:
st.session_state['sepal_length'] = 5.4
if 'sepal_width' not in st.session_state:
st.session_state['sepal_width'] = 3.4
if 'petal_length' not in st.session_state:
st.session_state['petal_length'] = 1.6
if 'petal_width' not in st.session_state:
st.session_state['petal_width'] = 0.4
# Sidebar for input features
with st.sidebar:
st.subheader("Input Features")
st.session_state.sepal_length = st.slider("Sepal Length", min_value=4.0, max_value=8.0, value=st.session_state.sepal_length)
st.session_state.sepal_width = st.slider("Sepal Width", min_value=2.0, max_value=4.5, value=st.session_state.sepal_width)
st.session_state.petal_length = st.slider("Petal Length", min_value=1.0, max_value=7.0, value=st.session_state.petal_length)
st.session_state.petal_width = st.slider("Petal Width", min_value=0.1, max_value=2.5, value=st.session_state.petal_width)
# Main section
st.subheader("Predicted Species")
if st.button("Classify"):
model = train_model() # Using the cached model
species = classify_iris(model, st.session_state.sepal_length, st.session_state.sepal_width, st.session_state.petal_length, st.session_state.petal_width)
st.success(f"The Iris is predicted to be a {species}")
if __name__ == "__main__":
main()
Explanation of the Code
- The function
classify_iris
uses a RandomForestClassifier to predict the species of the Iris. - The sidebar is created using
st.sidebar
, where we place our sliders for input features. - The
st.button
is used to trigger the classification once the user inputs the features. - The
train_model
function is decorated with@st.cache
, which tells Streamlit to cache the trained model. This means the model is trained once and reused, speeding up subsequent predictions. - Session state (
st.session_state
) is used to store and retrieve user inputs. This makes the app more user-friendly, as it remembers the user's previous inputs.
Deploying Streamlit on AWS EC2
1. Set Up an AWS EC2 Instance
- Choose an Instance Type: Select an EC2 instance type that matches your app’s resource requirements (CPU, memory). For a simple Streamlit app, a small or medium instance (like t2.micro or t2.medium) might be sufficient.
- Configure Security Group: Open the necessary ports. Streamlit apps typically run on port 8501, so ensure this port is open. Also, open port 22 for SSH access.
- Launch and Connect: Once your instance is set up, connect to it via SSH.
2. Install Required Software
- Python and Pip: Ensure Python is installed. You can install it using the package manager of the Linux distribution your EC2 instance is running.
- Streamlit: Install Streamlit using pip:
pip install streamlit
. - Dependencies: Install any other dependencies your app might require.
3. Upload Your Streamlit App
- You can transfer your Streamlit app files to your EC2 instance using SCP (Secure Copy Protocol) or any FTP client.
4. Run the Streamlit App
- Run your Streamlit app with
streamlit run your_app.py
. - To keep the app running continuously, consider using a process manager like
tmux
,screen
, orsupervisord
.
5. Accessing the App
- Your app should now be accessible via your EC2 instance’s public IP or DNS, followed by the port number (e.g.,
http://ec2-x-x-x-x.compute-1.amazonaws.com:8501
).
Precautions and Best Practices
1. Security
- Minimal Open Ports: Only open the ports that are absolutely necessary.
- Use a Firewall: Configure a firewall (like AWS’s security groups) to control the traffic to your instance.
- SSL/TLS: It’s a best practice to set up an SSL/TLS certificate for your app to enable HTTPS, ensuring secure data transmission.
2. Performance and Scaling
- Monitoring: Regularly monitor your EC2 instance’s performance. AWS CloudWatch can be used for monitoring CPU, memory, and network usage.
- Scaling: Be prepared to scale your instance if your app requires more resources. AWS provides easy scaling options but knowing when to scale is crucial.
- Load Balancing: For high-traffic apps, consider using a load balancer to distribute traffic across multiple instances.
3. Data Storage
- Separate Storage: For apps that require data storage, use AWS services like RDS for databases or S3 for file storage instead of storing data on the EC2 instance.
- Backups: Regularly backup your instance and data. AWS provides options for automated backups.
4. Cost Management
- Choose the Right Instance: Select an instance type that balances performance and cost.
- Monitor Costs: Keep an eye on your AWS bill and use AWS’s budgeting tools to alert you when costs exceed your expectations.
5. Updates and Maintenance
- Regularly update your software to patch security vulnerabilities.
- Plan for downtime during maintenance and notify your users accordingly.
Deploying on AWS EC2 gives you flexibility and control, but it also requires a good understanding of cloud computing and AWS services. For simpler deployment, especially for small projects or prototypes, platforms like Streamlit sharing or Heroku might be more appropriate.
If you are interested in an end-to-end and detailed guide on what can be done to actually use streamlit in production, checkout the link below:
Also , check out my other interesting blogs: