The result deployment is the last and most important step in a machine learning project.After the algorithm is selected, the algorithm is trained to generate the model and deployed to the production environment to solve practical problems using machine learning. After the model is generated, the model needs to be updated regularly to keep the model in the latest and most effective state. It is usually recommended to update the model every 3 to 6 months.
Finding an algorithm that can generate high-accuracy models is not the last step in machine learning. In actual projects, the generated model needs to be serialized and published to a production environment. When new data appears, you need to deserialize the saved model and then use it to predict the new data.
1.1 Models of machine learning through pickle serialization and deserialization
pickle is a standard Python serialization method that can be used to serialize models generated by machine learning algorithms and save them to a file. When new data is required, the model saved in the file is deserialized and used to predict the results of new data.
Below is an example of a model generated by training a logistic regression algorithm based on the Pima Indians dataset, serializing it to a file, and then deserializing this model. In machine learning projects, model serialization is particularly important when model training takes a lot of time.
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from pickle import dump
from pickle import load
# Import data
filename = 'pima_data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(filename, names=names)
# Divide data into input data and output results
array =
X = array[:, 0:8]
Y = array[:, 8]
test_size = 0.33
seed = 4
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
# Train the model
model = LogisticRegression()
(X_train, Y_train)
# Save the model
model_file = 'finalized_model.sav'
with open(model_file, 'wb') as model_f:
# Model serialization
dump(model, model_f)
# Loading the model
with open(model_file, 'rb') as model_f:
# Model deserialization
loaded_model = load(model_f)
result = loaded_model.score(X_test, Y_test)
print("Algorithm evaluation result: %.3f%%" % (result * 100))
1.2 Models of machine learning through joblib serialization and deserialization
joblib is part of the SciPy ecosystem and provides common tools to serialize Python objects and deserialize Python objects. When serializing objects through joblib, data is saved in NumPy format, which is very effective for some algorithms that save data into the model, such as K nearest neighbor algorithm.
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from joblib import dump
from joblib import load
# Import data
filename = 'pima_data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(filename, names=names)
# Divide data into input data and output results
array =
X = array[:, 0:8]
Y = array[:, 8]
test_size = 0.33
seed = 4
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
# Train the model
model = LogisticRegression()
(X_train, Y_train)
# Save the model
model_file = 'finalized_model_joblib.sav'
with open(model_file, 'wb') as model_f:
dump(model, model_f)
# Loading the model
with open(model_file, 'rb') as model_f:
loaded_model = load(model_f)
result = loaded_model.score(X_test, Y_test)
print("Algorithm evaluation result: %.3f%%" % (result * 100))
References
[1] Wei Zhenyuan. 2018. Machine Learning: Python Practice [M]. Beijing: Electronics Industry Press