Recommender installation and configuration

As of Totara 17 the recommender service (ml_recommender) is now deprecated. Please refer to the Machine Learning Service documentation to find out more about how Totara recommendations work.

System requirements

Unfortunately, PHP is not really suitable for Machine Learning and Neural Networks tasks. There are many reasons for that but reality is that most of ML related software is written on either Python or Java / .NET stacks. We have selected Python3 for our Machine Learning stack and further development will also be based on Python.

This means that in addition to standard requirements for Totara installation, server that processes ML data must have Python3 installation with all it dependencies.

Recommenders engine can work on both Windows and Linux that meet the following requirements:

  • Python 3.6
  • Python pip package installer
  • CPU with multiple cores recommended
  • 4GB RAM or more recommended
  • GCC compiler (required for LightFM cython compilation)
  • It must be the same host that runs Totara or it can be different host that has shared volume with Totara instance

Hardware requirements widely depend on amount and specifics of the data in your instance, but we have prepared indicative performance benchmark on two different data sets run on two different AWS instances which can be found below.

Python Installation

Installation on Debian based linux distributives:

Install on Debian based distributive
sudo apt update
sudo apt install -y python3.7
sudo apt install -y python3-pip
sudo apt install -y python3-wheel
sudo apt install -y python3-venv
sudo apt install -y python3-dev


Installation on Red Hat based distributives:

Installation on Red Hat based distributives
sudo yum update
sudo yum install -y python3.7
sudo yum install -y python3-pip
sudo yum install -y python3-wheel
sudo yum install -y python3-venv
sudo yum install -y python3-devel

Dependencies installation

Dependencies listed in extensions/ml_recommender/python/requirements.txt and can be installed using pip3:
Install dependencies
sudo pip3 install -r extensions/ml_recommender/python/requirements.txt

Make sure that the system user who will run python script has access to those dependencies installed (so either install them system-wide or install by that user or use virtualenv).

System cron tasks (processing)

Next step is installing cron tasks. Current implementation works in three steps for which need to be run separate script:

  1. Export data (php server/ml/recommender/cli/export_data.php ): Should be run only when data is needed to be processed (e.g. once a day)

  2. Process data by Python ( eval php server/ml/recommender/cli/recommender_command.php ): Can be run every 5 minutes - it will check if data was already processed and exit if it was

  3. Import data back to server ( server/ml/recommender/cli/import_recommendations.php ): Can be run every 5 minutes - it will check if data was already imported and will exit if it was 

Run all those scripts as user that have full access to the selected folder.

App server and ML server are same host

If Python is configured on the same host, then just run prepared script:

30 1 * * * (cd /[your totara root path]/server/ml/recommender/cli; bash ./run.sh)

App server and ML server are different hosts

In this case, the whole process must be executed as three separate cron tasks. ML server will need to have at least the extensions folder of Totara distribution which has required Python script.

For simplicity of maintenance it is advised to completely reflect the folder structure of Totara distribution and data folder on both hosts.

This option increases complexity, so instead of separate hosts consider installation full Totara on ML instance and connecting it to a read-only database slave, which will not be available via load balancer and only serve as a ML processor. Use the method below only when ML server must not have access to Totara database or must not have Totara instance installed.

On a Totara host:

30 1 * * * (cd /[your totara root path]/; php server/ml/recommender/cli/export_data.php)
*/5 * * * * (cd /[your totara root path]/; php server/ml/recommender/cli/import_data.php)

On Python host:

Python execution CLI command - obtained by running in Totara root directory and keeping one side:

php server/ml/recommender/cli/recommender_command.php

Python command - would look similar to the following (see parameter descriptions below):

'/usr/bin/python3.7' '/var/www/totara/src/work/reorg/extensions/ml_recommender/python/ml_recommender.py' --query 'hybrid' ...

This script emits the full command required to run the python script for the recommendations process according to the current configuration settings on the settings page.

If the host where Python is running has a different path mapped to shared volume, it must be adjusted in --data-path parameter.

Example cron entry:

*/5 * * * * /usr/bin/python3.7 /var/www/totara/src/work/reorg/extensions/ml_recommender/python/ml_recommender.py --query 'hybrid' --result_count_user '25' --result_count_item '15' --threads '6' --data_path '/var/www/totara/data/recommender/data' --content_filtering 'True'

Cron frequency

Generally it should be adequate to run the export process once a day.  However, the optimum run frequency depends on many factors, for example how active users on the site are and how much new content is created daily.

Configuration

To enable recommender system checks, add the following line to the config.php configuration file for your site:

$CFG->preventexecpath = false;

Log in to Totara instance and navigate to settings page: https://your_url/admin/settings.php?section=ml_setting_recommender

Python3 binary path is used only to generate executable string (to run via eval). Python is not run from PHP environment.

Parameters to ml_recommenders.py script

--query:

Full Hybrid - Meta-data & Content - utilises content data, item meta-data and user-item interaction data (longest time to process, highest granularity);

Partial Hybrid - utilises item meta-data and user-item interaction data

Matrix Factorisation - utilises only user-item interaction data (shortest processing time, lowest granularity)

TEXT - DROPDOWN - Full Hybrid

--threads:

Number of cores/threads that may be utilised by the recommendation library (should be less than physical cores).

NUMERIC - DROPDOWN - 1

--result_count_user

User result count - number of items-to-user recommendations to return.

NUMERIC - DROPDOWN - 10

--result_count_item

Item result count - number of items-to-item recommendations to return.

NUMERIC - DROPDOWN - 10

--data_path

Path to exported data files

TEXT - FILESYSTEM PATH - /totara_data_root/recommender/data

--interactions_period

The period of user-item interactions to limit recommendations to, e.g. previous week, previous 2 weeks, previous 4 weeks

TEXT - DROPDOWN - ??? weeks, months

--path_to_python

Location of the python3 executable on the system

TEXT - FILESYSTEM PATH - default blank (admin will need to install python and tell us where it is)

Benchmark results

The following indicative tests were run using only user-item interaction data processed through collaborative filtering (matrix factorisation). Memory requirements and run times will increase when extra features processing via content-based filtering (i.e. user and item meta-data and/or content meta-data) is included as one of the hybrid processing modes.

Data sets used

Data setUsersItemsInteractions
100K interactions941168599894
25M interactions1625415904725000095


Benchmark results

AWS InstanceCoresMemory100K Interactions25M interactions
t3.medium24GB1 min 20 seccrashed (out of memory)
t3.xlarge416GB1 min 0 sec10 hrs 35 min