Skip to main content

Debugging Attribution Engine

There are some minor issues that could cause the application to fail, as of v2.4.7.

An indicator of this is the prediction task taking too long and its SEARCH status is still PENDING.

This leads to the attribution-database not being updated. Since attribution-api is built to return the results if it was previously run successfully, and it was never updated, we will have to manually delete the row from attribution-database ourselves.

Typically, the model modules will continue running it separately. So it will likely have succeeded. Rerunning the failed modules on ACUBE, after deleting the row, should promptly return the successful results.

Solution

Generally, the following steps can be used to solve most issues.

1. Revoke the ongoing task

There is an endpoint that can revoke (force change) the status of all phases of a sample to FAILURE, this allows a resubmission to rerun everything from PREPROCESSING to SEARCH. This endpoint is used to fix the error where a task dies due to some fault along the messaging queue.

warning

The caveat is if the task id cannot be found anymore, this method won't work. An example of such edge case is restarting the container would flush the amqp tasks from redis (celery-service backend).

Version 2.4.7

There is a bug with this endpoint for the mentioned version. For samples that have been predicted successfully previously, revoking will not work as intended. You will have to resort to step 3. This is fixed in v2.4.8. You can check which your versions here.

2. Rerun submission

As the admin, you can rerun the correlation module for that specific submission.

  • If the sample has been ran before, a minute of wait should suffice.
  • If its a new sample, wait and observe if the detailed_status ' PREPROCESSING stage has changed.

If the results are then returned successfully, you have fixed the issue.

3. Delete the row

Manually visiting the database to delete the specific row from the db can avoid some edge cases that are not fixed with the revoke endpoint.

***=# delete from predictions where sha256_hash='dcb6d7a68fbe3d8b3ed1a92a945260565cdbe66166d2c275a179b92261413e50';

Check out how to access the database here

After this step, repeat step 2. If the results are then returned successfully, you have fixed the issue.

tip

Revoke endpoint advantage

If there are changes to the model or database of similar functions, thus the need to rerun the sample. Due to the attribution-api's nature of returning the results if it was previously run successfully, you will need the revoke endpoint to change the status of a binary for the models to error and they will reprocess the sample.

VERSION 2.4.8

In v2.4.8, you can skip this step because this method has been incorporated into the revoke endpoint itself.

4. Restart the containers

If there are still issues or if any containers has gone unresponsive, it will warrant a restart.

docker compose -f docker-compose.stg.yaml down
docker compose -f docker-compose.stg.yaml up -d

After the restart, repeat step 2. If the results are then returned successfully, you have fixed the issue.