Result Deployment_Permanent Loading Model

Updated to 1 hour ago

The result deployment is the last and most important step in a machine learning project.After the algorithm is selected, the algorithm is trained to generate the model and deployed to the production environment to solve practical problems using machine learning. After the model is generated, the model needs to be updated regularly to keep the model in the latest and most effective state. It is usually recommended to update the model every 3 to 6 months.

Finding an algorithm that can generate high-accuracy models is not the last step in machine learning. In actual projects, the generated model needs to be serialized and published to a production environment. When new data appears, you need to deserialize the saved model and then use it to predict the new data.

1.1 Models of machine learning through pickle serialization and deserialization

pickle is a standard Python serialization method that can be used to serialize models generated by machine learning algorithms and save them to a file. When new data is required, the model saved in the file is deserialized and used to predict the results of new data.

Below is an example of a model generated by training a logistic regression algorithm based on the Pima Indians dataset, serializing it to a file, and then deserializing this model. In machine learning projects, model serialization is particularly important when model training takes a lot of time.

from pandas import read_csv
 from sklearn.model_selection import train_test_split
 from sklearn.linear_model import LogisticRegression
 from pickle import dump
 from pickle import load

 # Import data
 filename = 'pima_data.csv'
 names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
 data = read_csv(filename, names=names)
 # Divide data into input data and output results
 array =
 X = array[:, 0:8]
 Y = array[:, 8]
 test_size = 0.33
 seed = 4
 X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
 # Train the model
 model = LogisticRegression()
 (X_train, Y_train)
 # Save the model
 model_file = 'finalized_model.sav'
 with open(model_file, 'wb') as model_f:
     # Model serialization
     dump(model, model_f)

 # Loading the model
 with open(model_file, 'rb') as model_f:
     # Model deserialization
     loaded_model = load(model_f)
     result = loaded_model.score(X_test, Y_test)
     print("Algorithm evaluation result: %.3f%%" % (result * 100))

1.2 Models of machine learning through joblib serialization and deserialization

joblib is part of the SciPy ecosystem and provides common tools to serialize Python objects and deserialize Python objects. When serializing objects through joblib, data is saved in NumPy format, which is very effective for some algorithms that save data into the model, such as K nearest neighbor algorithm.

from pandas import read_csv
 from sklearn.model_selection import train_test_split
 from sklearn.linear_model import LogisticRegression
 from joblib import dump
 from joblib import load

 # Import data
 filename = 'pima_data.csv'
 names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
 data = read_csv(filename, names=names)
 # Divide data into input data and output results
 array =
 X = array[:, 0:8]
 Y = array[:, 8]
 test_size = 0.33
 seed = 4
 X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
 # Train the model
 model = LogisticRegression()
 (X_train, Y_train)
 # Save the model
 model_file = 'finalized_model_joblib.sav'
 with open(model_file, 'wb') as model_f:
     dump(model, model_f)
 # Loading the model
 with open(model_file, 'rb') as model_f:
     loaded_model = load(model_f)
     result = loaded_model.score(X_test, Y_test)
     print("Algorithm evaluation result: %.3f%%" % (result * 100))

References

[1] Wei Zhenyuan. 2018. Machine Learning: Python Practice [M]. Beijing: Electronics Industry Press