Troubleshooting¶
This page describes typical failure scenario’s and provides a guideline on how to troubleshoot them.
A resource is stuck in the state available¶
When a resource is stuck in the available state, it usually means that the agent, which should deploy the resource, is currently
down or paused. Click on the Resources
tab of the web-console, to get an
overview of the different resources in the model. This overview shows the state of each resource and the name of its agent.
Filter on resources in the available state and check which resource are ready to be deployed (i.e. a resource without
dependencies or a resource for which all dependencies were deployed successfully). The agent of that resource, is the agent that
causes the problem. In the figure below, the global
GnmiResource should be ready to deploy on the spine
agent.
Next, go to the Agents
tab of the web-console to verify the state of that agent.
An agent can be in one of the following states:
Down
Paused
Up
Each of the following subsections describes what should be done when the agent is in each of the different states.
The agent is down¶
The Section Agent doesn’t come up provides information on how to troubleshoot the scenario where an agent that shouldn’t be down is down.
The agent is paused¶
Unpause the agent by clicking the Unpause
button in the Agents
tab of the web-console.
The agent is up¶
When the agent is in the up state, it should be ready to deploy resources. Read the agent log to verify it doesn’t contain
error or warning messages that would explain why the agent is not deploying any resources. For auto-started agents, three
different log files exist. The log files are present in <config.log-dir>/agent-<environment-id>.[log|out|err]
. The
environment ID can be found in the URL of the web-console, or in the Settings
tab.
More information about the different log files can be found
here. If the log file doesn’t provide any more information, trigger the agent to execute a
deployment by clicking on the Force repair
button in the Agents
tab of the web-console, as shown in the figure below:
When the agent receives the notification from the server, it writes the following log message in its log:
INFO inmanta.agent.agent Agent <agent-name> got a trigger to update in environment <environment ID>
If the notification from the server doesn’t appear in the log file of the agent after clicking the Force repair
button, the
problem is situated on the server side. Check if the server log contains any error messages or warning that could explain
the reason why the agent didn’t get a notification from the server. The server log file is situated at
<config.log-dir>/server.log
.
The deployment of a resource fails¶
When a resource cannot be deployed, it ends up in one of the following deployment states:
failed: A resource ends up in the
failed
state when the handler of that resource raises an uncaught exception. Check the log of the resource to get more details about the issue.unavailable: A resource ends up in the
unavailable
state when no handler could be found to deploy that resource. Check the log of the resource to get more details about the issue.undefined: A resource ends up in the
undefined
state when an attribute required by that resource, didn’t yet resolve to a definite value. Read Section Check which attributes are undefined to find out which attributes are undefined.skipped: When a resource is in the
skipped
state, it can mean two different things. Either the resource cannot be deployed because one of its dependencies ended up in the failed state or the handler itself raised a SkipResource exception to indicate that the resource in not yet ready to be deployed. The latter case can occur when a VM is still booting for example. Check the log of the resource to get more information about actual root cause.skipped_for_undefined: The
skipped_for_undefined
state indicates that the resource cannot be deployed because one of its dependencies cannot be deployed. Check the log of the resource to get information about the actual dependency that cannot be deployed.
Read the logs of a resource¶
This section describes how to obtain the logs for a specific resource. In the Resources
tab of the web-console, click on
Show Details
for the desired resource.
Next, in the Logs
tab of this view, the logs can be sorted and filtered. Click on the chevron for a specific log line to display more
information, such as the traceback.
Check which attributes are undefined¶
To find out undefined attributes of a resource, click on Show Details
for the resource in the
undefined
state, as shown in the figure below.
Look for attributes marked as undefined in the list of attributes of that resource (See figure below). Track the source of this attribute down within the configuration model to find out why this attribute is undefined.
Agent doesn’t come up¶
This section explains how to troubleshoot the problem where an agent is in the down state while it should be up. In the figure shown below, the four agents are down.
Read through the logs of the agent. These logs can be found at the following location:
<config.log-dir>/agent-<environment-id>.[log|out|err]
. The environment ID is present in the URL of the web-console.
More information about the different log files can be found here. When reading
those log files, pay specific attention to error messages and warnings that could explain why the agent is marked as down.
Potential reasons why an agent wouldn’t start:
bind-address set incorrectly: The Inmanta server listens on all the interfaces configured via the
server.bind-address
option. If the server doesn’t listen on an interface used by a remote agent, the agent will not be able to connect to the server.Authentication issue: If the Inmanta server has been setup with authentication, a misconfiguration may deny an agent access to the Inmanta API. For example, not configuring a token provider (issuer) with
sign=true
in theauth_jwt_<ID>
section of the Inmanta configuration file. Documentation on how to configure authentication correctly can be found here.SSL problems: If the Inmanta server is configured to use SSL, the Agent should be configured to use SSL as well (See the SSL-related configuration options in the
server
andagent_rest_transport
section of the Inmanta configuration reference)Network issue: Many network-related issue may exist which don’t allow the agent to establish a connection with the Inmanta server. A firewall may blocks traffic between the Inmanta agent and the server, no network route may exist towards the Inmanta server, etc.
Recompilation failed¶
You can trigger a recompilation from the Compile Reports
tab. It shows a list of compile reports for the latest
compilations. Click on Show Details
to see more information about a given report.
Each step of the compile process is shown. Click on the chevron, as shown below, for a specific step, to display more
information such as the output produced by that step and the return code. Verify that the
timestamp of the compile report corresponds to the time the compilation was triggered in the web-console. If no compile report was
generated or the compile report doesn’t show any errors, check the server logs as well. By default the server log is present in
<config.log-dir>/server.log
.
Logs show “empty model” after export¶
This log message indicates that something went wrong during the compilation or the export of the model to the server. To get
more information about the problem, rerun the command with the -vvv
and the -X
options. The -vvv
option increases
the log level of the command to the DEBUG level and the -X
option shows stack traces and errors.
$ inmanta -vvv export -X
Compilation fails¶
In rare situations, the compiler might fail with a List modified after freeze
or an
Optional variable accessed that has no value
error, even though the model is syntactically correct. The following
sections describe why this error occurs and what can be done to make the compilation succeed.
Reason for compilation failure¶
When the compiler runs, it cannot know upfront how many elements will be added to a relationship. At some stages of the compilation process the compiler has to guess which relations are completely populated in order to be able to continue the compilation process. Heuristics are being used to determine the correct order in which relationships can be considered completely populated. In most situation these heuristics work well, but in rare situations the compiler makes an incorrect decision and considers a relationship to be complete while it isn’t. In those situation the compiler crashes with one of the following exception:
List modified after freeze
: This error occurs when a relationship with an upper arity larger than one was considered complete too soon.Optional variable accessed that has no value
: This error occurs when a[0:1]
relationship was considered complete too soon.
The following sections provide information on how this issue can be resolved.
Relationship precedence policy¶
Warning
The inmanta compiler is very good at determining in which order it should evaluate the orchestration model. Unfortunately in very complex models it might not be able to do this. In that case you can give the compiler some instruction by providing it with relationship precedence rules.
This is a very powerful tool because you can override all the intelligence in the compiler. This means that if you provide the correct rule it will fix the compilation. If you provide a wrong rule it can make this even worse. However, it can never make the orchestrator compile incorrect results.
The above-mentioned problem can be resolved by defining a relation precedence policy in the project.yml
file of an Inmanta project. This policy consists of a list of rules. Each rule defining the order in which two
relationships should be considered complete with respect to each other. By providing this policy, it’s possible to
guide the compiler in making the correct decisions that lead to a successful compilation.
Example: Consider the following project.yml
file.
1name: quickstart
2modulepath: libs
3downloadpath: libs
4repo: https://github.com/inmanta/
5description: A quickstart project that installs a drupal website.
6relation_precedence_policy:
7 - "a::EntityA.relation before b::EntityB.other_relation"
The last two lines of this file define the relation precedence policy of the project. The policy contains one rule
saying that the relationship relation
of entity a::EntityA
should be considered completely populated before
the relation other_relation
of entity b::EntityB
can be considered complete.
Each rule in a relation precedence policy should have the following syntax:
<first-type>.<first-relation-name> before <then-type>.<then-relation-name>
Compose a relationship precedence policy¶
Depending on the complexity of your model, it might be difficult to determine the rule(s) that should be added to the relation precedence policy to make the compile succeed. In this section we will provide some guidelines to compose the correct set of rules.
When the compilation of a model fails with a List modified after freeze
or an
Optional variable accessed that has no value
error, the output from the compiler will contain information regarding
which relationship was frozen too soon.
For example, consider the following compiler output:
...
Exception explanation
=====================
The compiler could not figure out how to execute this model.
During compilation, the compiler has to decide when it expects a relation to have all its elements.
In this compiler run, it guessed that the relation 'finds' on the instance maze::ServiceA
(instantiated at /home/centos/maze_project/libs/maze/model/_init.cf:43) would be complete with the values [], but the
value maze::SubB (instantiated at /home/centos/maze_project/libs/maze/model/_init.cf:62) was added at
/home/centos/maze_project/libs/maze/model/_init.cf:75
...
In the above-mentioned example, the relationship maze::ServiceA.finds
was incorrectly considered complete. To find
the other relation in the ordering conflict, compile the model once more with the log level set to DEBUG by passing
the -vvv
option and grep for the log lines that contain the word freezing
. The output will contains a log line
for each relationship that is considered complete. This way you get an overview regarding the order in which the
compiler considers the different relations to be complete.
$ inmanta -vvv compile|grep -i freezing
...
inmanta.execute.schedulerLevel 3 Freezing ListVariable maze::ServiceA (instantiated at /home/centos/maze_project/libs/maze/model/_init.cf:43) maze::ServiceA.finds = []
inmanta.execute.schedulerLevel 3 Freezing ListVariable maze::ServiceA (instantiated at /home/centos/maze_project/libs/maze/model/_init.cf:43) maze::ServiceA.finds = []
inmanta.execute.schedulerLevel 3 Freezing ListVariable maze::ServiceA (instantiated at /home/centos/maze_project/libs/maze/model/_init.cf:43) maze::ServiceA.finds = []
inmanta.execute.schedulerLevel 3 Freezing ListVariable maze::ServiceA (instantiated at /home/centos/maze_project/libs/maze/model/_init.cf:43) maze::ServiceA.finds = []
inmanta.execute.schedulerLevel 3 Freezing ListVariable maze::ServiceA (instantiated at /home/centos/maze_project/libs/maze/model/_init.cf:43) maze::ServiceA.finds = []
inmanta.execute.schedulerLevel 3 Freezing ListVariable maze::World (instantiated at /home/centos/maze_project/libs/maze/model/_init.cf:10) maze::World.services = [maze::ServiceA 7f8feb20f700, maze::ServiceA 7f8feb20faf0, maze::ServiceA 7f8feb20fee0, maze::ServiceA 7f8feb1e7310, maze::ServiceA 7f8feb1e7700]
Could not set attribute `finds` on instance `maze::ServiceA
...
All the relationships frozen after the freeze of the maze::ServiceA.finds
relationship are potentially causing the
compilation problem. In the above-mention example, there is only one, namely the maze::World.services
relationship.
As such the following rule should be added to the relation precedence policy to resolve this specific conflict:
maze::World.services before maze::ServiceA.finds
When you compile the model once more with the relation precedence policy in-place, the compilation can either succeed
or fail with another List modified after freeze
or an Optional variable accessed that has no value
error. The
latter case indicates that a second rule should be added to the relation precedence policy.
Debugging¶
Debugging the server is possible in case the rpdb package is installed.
Installing the rpdb
package to the virtual environment used by Inmanta by default can be done the following way:
$ /opt/inmanta/bin/python3 -m pip install rpdb
Rpdb can be triggered by sending a TRAP signal to the inmanta server process.
$ kill -5 <PID>
After receiving the signal, the process hangs, and it’s possible to attach a pdb
debugger
by connecting to 127.0.0.1, on port 4444 (for example using telnet).