Recommender installation and configuration
As of Totara 17 the recommender service (ml_recommender) is now deprecated. Please refer to the Machine Learning Service documentation to find out more about how Totara recommendations work.
System requirements
Unfortunately, PHP is not really suitable for Machine Learning and Neural Networks tasks. There are many reasons for that but reality is that most of ML related software is written on either Python or Java / .NET stacks. We have selected Python3 for our Machine Learning stack and further development will also be based on Python.
This means that in addition to standard requirements for Totara installation, server that processes ML data must have Python3 installation with all it dependencies.
Recommenders engine can work on both Windows and Linux that meet the following requirements:
- Python 3.6
- Python pip package installer
- CPU with multiple cores recommended
- 4GB RAM or more recommended
- GCC compiler (required for LightFM cython compilation)
- It must be the same host that runs Totara or it can be different host that has shared volume with Totara instance
Hardware requirements widely depend on amount and specifics of the data in your instance, but we have prepared indicative performance benchmark on two different data sets run on two different AWS instances which can be found below.
Python Installation
Installation on Debian based linux distributives:
sudo apt update sudo apt install -y python3.7 sudo apt install -y python3-pip sudo apt install -y python3-wheel sudo apt install -y python3-venv sudo apt install -y python3-dev
Installation on Red Hat based distributives:
sudo yum update sudo yum install -y python3.7 sudo yum install -y python3-pip sudo yum install -y python3-wheel sudo yum install -y python3-venv sudo yum install -y python3-devel
Dependencies installation
Dependencies listed in extensions/ml_recommender/python/requirements.txt and can be installed using pip3:
sudo pip3 install -r extensions/ml_recommender/python/requirements.txt
Make sure that the system user who will run python script has access to those dependencies installed (so either install them system-wide or install by that user or use virtualenv).
System cron tasks (processing)
Next step is installing cron tasks. Current implementation works in three steps for which need to be run separate script:
Export data (
php server/ml/recommender/cli/export_data.php
): Should be run only when data is needed to be processed (e.g. once a day)Process data by Python (
eval php server/ml/recommender/cli/recommender_command.php
): Can be run every 5 minutes - it will check if data was already processed and exit if it was- Import data back to server (
server/ml/recommender/cli/import_recommendations.php
): Can be run every 5 minutes - it will check if data was already imported and will exit if it was
Run all those scripts as user that have full access to the selected folder.
App server and ML server are same host
If Python is configured on the same host, then just run prepared script:
30 1 * * * (cd /[your totara root path]/server/ml/recommender/cli; bash ./run.sh)
App server and ML server are different hosts
In this case, the whole process must be executed as three separate cron tasks. ML server will need to have at least the extensions folder of Totara distribution which has required Python script.
For simplicity of maintenance it is advised to completely reflect the folder structure of Totara distribution and data folder on both hosts.
This option increases complexity, so instead of separate hosts consider installation full Totara on ML instance and connecting it to a read-only database slave, which will not be available via load balancer and only serve as a ML processor. Use the method below only when ML server must not have access to Totara database or must not have Totara instance installed.
On a Totara host:
30 1 * * * (cd /[your totara root path]/; php server/ml/recommender/cli/export_data.php) */5 * * * * (cd /[your totara root path]/; php server/ml/recommender/cli/import_data.php)
On Python host:
Python execution CLI command - obtained by running in Totara root directory and keeping one side:
php server/ml/recommender/cli/recommender_command.php
Python command - would look similar to the following (see parameter descriptions below):
'/usr/bin/python3.7' '/var/www/totara/src/work/reorg/extensions/ml_recommender/python/ml_recommender.py' --query 'hybrid' ...
This script emits the full command required to run the python script for the recommendations process according to the current configuration settings on the settings page.
If the host where Python is running has a different path mapped to shared volume, it must be adjusted in --data-path parameter.
Example cron entry:
*/5 * * * * /usr/bin/python3.7 /var/www/totara/src/work/reorg/extensions/ml_recommender/python/ml_recommender.py --query 'hybrid' --result_count_user '25' --result_count_item '15' --threads '6' --data_path '/var/www/totara/data/recommender/data' --content_filtering 'True'
Cron frequency
Generally it should be adequate to run the export process once a day. However, the optimum run frequency depends on many factors, for example how active users on the site are and how much new content is created daily.
Configuration
To enable recommender system checks, add the following line to the config.php configuration file for your site:
$CFG->preventexecpath = false;
Log in to Totara instance and navigate to settings page: https://your_url/admin/settings.php?section=ml_setting_recommender
Python3 binary path is used only to generate executable string (to run via eval). Python is not run from PHP environment.
Parameters to ml_recommenders.py script
--query:
Full Hybrid - Meta-data & Content - utilises content data, item meta-data and user-item interaction data (longest time to process, highest granularity);
Partial Hybrid - utilises item meta-data and user-item interaction data
Matrix Factorisation - utilises only user-item interaction data (shortest processing time, lowest granularity)
TEXT - DROPDOWN - Full Hybrid
--threads:
Number of cores/threads that may be utilised by the recommendation library (should be less than physical cores).
NUMERIC - DROPDOWN - 1
--result_count_user
User result count - number of items-to-user recommendations to return.
NUMERIC - DROPDOWN - 10
--result_count_item
Item result count - number of items-to-item recommendations to return.
NUMERIC - DROPDOWN - 10
--data_path
Path to exported data files
TEXT - FILESYSTEM PATH - /totara_data_root/recommender/data
--interactions_period
The period of user-item interactions to limit recommendations to, e.g. previous week, previous 2 weeks, previous 4 weeks
TEXT - DROPDOWN - ??? weeks, months
--path_to_python
Location of the python3 executable on the system
TEXT - FILESYSTEM PATH - default blank (admin will need to install python and tell us where it is)
Benchmark results
The following indicative tests were run using only user-item interaction data processed through collaborative filtering (matrix factorisation). Memory requirements and run times will increase when extra features processing via content-based filtering (i.e. user and item meta-data and/or content meta-data) is included as one of the hybrid processing modes.
Data sets used
Data set | Users | Items | Interactions |
---|---|---|---|
100K interactions | 941 | 1685 | 99894 |
25M interactions | 162541 | 59047 | 25000095 |
Benchmark results
AWS Instance | Cores | Memory | 100K Interactions | 25M interactions |
---|---|---|---|---|
t3.medium | 2 | 4GB | 1 min 20 sec | crashed (out of memory) |
t3.xlarge | 4 | 16GB | 1 min 0 sec | 10 hrs 35 min |