job_shop_lib.reinforcement_learning

Package for reinforcement learning components.

class ObservationSpaceKey(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Enumeration of the keys for the observation space dictionary.

REMOVED_NODES = 'removed_nodes'
EDGE_INDEX = 'edge_index'
OPERATIONS = 'operations'
JOBS = 'jobs'
MACHINES = 'machines'
class RewardObserver(dispatcher, *, subscribe=True)[source]

Bases: DispatcherObserver

Base class for all reward functions.

Parameters:
rewards

List of rewards calculated for each operation scheduled by the dispatcher.

property last_reward: float

Returns the reward of the last step or 0 if no rewards have been calculated.

reset()[source]

Sets rewards attribute to a new empty list.

Return type:

None

class MakespanReward(dispatcher, *, subscribe=True)[source]

Bases: RewardObserver

Dense reward function based on the negative makespan of the schedule.

The reward is calculated as the difference between the makespan of the schedule before and after the last operation was scheduled. The makespan is the time at which the last operation is completed.

Parameters:

dispatcher (Dispatcher)

current_makespan

Makespan of the schedule after the last operation was scheduled.

reset()[source]

Sets rewards attribute to a new empty list.

Return type:

None

update(scheduled_operation)[source]

Called when an operation is scheduled on a machine.

Parameters:

scheduled_operation (ScheduledOperation)

class IdleTimeReward(dispatcher, *, subscribe=True)[source]

Bases: RewardObserver

Dense reward function based on the negative idle time of the schedule.

The reward is calculated as the difference between the idle time of the schedule before and after the last operation was scheduled. The idle time is the sum of the time between the end of the last operation and the start of the next operation.

Parameters:
update(scheduled_operation)[source]

Called when an operation is scheduled on a machine.

Parameters:

scheduled_operation (ScheduledOperation)

class GanttChartWrapperConfig[source]

Bases: TypedDict

Configuration for creating the plot function with the plot_gantt_chart_wrapper function.

title: str | None
cmap: str
show_available_operations: bool
class GifConfig[source]

Bases: dict

Configuration for creating the GIF using the create_gannt_chart_video function.

gif_path: str | None
fps: int
remove_frames: bool
frames_dir: str | None
plot_current_time: bool
class VideoConfig[source]

Bases: TypedDict

Configuration for creating the video using the create_gannt_chart_video function.

video_path: str | None
fps: int
remove_frames: bool
frames_dir: str | None
plot_current_time: bool
class SingleJobShopGraphEnv(job_shop_graph, feature_observer_configs, reward_function_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.reinforcement_learning._reward_observers.MakespanReward'>, kwargs={}), graph_updater_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.graphs.graph_updaters._residual_graph_updater.ResidualGraphUpdater'>, kwargs={}), ready_operations_filter=<function filter_dominated_operations>, render_mode=None, render_config=None, use_padding=True)[source]

Bases: Env

A Gymnasium environment for solving a specific instance of the Job Shop Scheduling Problem represented as a graph.

This environment manages the scheduling process for a single Job Shop instance, using a graph representation and various observers to track the state and compute rewards.

Observation Space:

A dictionary with the following keys: - "removed_nodes": Binary vector indicating removed graph nodes. - "edge_list": Matrix of graph edges in COO format. - Feature matrices: Keys corresponding to the composite observer

features (e.g., "operations", "jobs", "machines").

Action Space:

MultiDiscrete space representing (job_id, machine_id) pairs.

Render Modes:
  • "human": Displays the current Gantt chart.

  • "save_video": Saves a video of the complete Gantt chart.

  • "save_gif": Saves a GIF of the complete Gantt chart.

Parameters:
dispatcher

Manages the scheduling process. See Dispatcher.

composite_observer

A CompositeFeatureObserver which aggregates features from multiple observers.

graph_updater

Updates the graph representation after each action. See GraphUpdater.

reward_function

Computes rewards for actions taken. See RewardObserver.

action_space

Defines the action space. The action is a tuple of two integers (job_id, machine_id). The machine_id can be -1 if the selected operation can only be scheduled in one machine.

observation_space

Defines the observation space. The observation is a dictionary with the following keys: - "removed_nodes": Binary vector indicating removed graph nodes. - "edge_list": Matrix of graph edges in COO format. - Feature matrices: Keys corresponding to the composite observer

features (e.g., "operations", "jobs", "machines").

render_mode

The mode for rendering the environment ("human", "save_video", "save_gif").

gantt_chart_creator

Creates Gantt chart visualizations. See GanttChartCreator.

use_padding

Whether to use padding in observations. Padding maintains the

metadata: dict[str, Any] = {'render_modes': ['human', 'save_video', 'save_gif']}
__init__(job_shop_graph, feature_observer_configs, reward_function_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.reinforcement_learning._reward_observers.MakespanReward'>, kwargs={}), graph_updater_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.graphs.graph_updaters._residual_graph_updater.ResidualGraphUpdater'>, kwargs={}), ready_operations_filter=<function filter_dominated_operations>, render_mode=None, render_config=None, use_padding=True)[source]

Initializes the SingleJobShopGraphEnv environment.

Parameters:
Return type:

None

property instance: JobShopInstance

Returns the instance the environment is working on.

property job_shop_graph: JobShopGraph

Returns the job shop graph.

reset(*, seed=None, options=None)[source]

Resets the environment.

Parameters:
  • seed (int | None)

  • options (dict[str, Any] | None)

Return type:

tuple[ObservationDict, dict]

step(action)[source]

Takes a step in the environment.

Parameters:

action (tuple[int, int]) -- The action to take. The action is a tuple of two integers (job_id, machine_id): the job ID and the machine ID in which to schedule the operation.

Returns:

  • The observation of the environment.

  • The reward obtained.

  • Whether the environment is done.

  • Whether the episode was truncated (always False).

  • A dictionary with additional information. The dictionary contains the following keys:

    • "feature_names": The names of the features in the observation.

    • "available_operations": The operations that are ready to be scheduled.

Return type:

A tuple containing the following elements

get_observation()[source]

Returns the current observation of the environment.

Return type:

ObservationDict

render()[source]

Renders the environment.

The rendering mode is set by the render_mode attribute:

  • human: Renders the current Gannt chart.

  • save_video: Saves a video of the Gantt chart. Used only if the

    schedule is completed.

  • save_gif: Saves a GIF of the Gantt chart. Used only if the schedule

    is completed.

class RenderConfig[source]

Bases: TypedDict

Configuration needed to initialize the GanttChartCreator class.

gantt_chart_wrapper_config: GanttChartWrapperConfig
video_config: VideoConfig
gif_config: GifConfig
class ObservationDict[source]

Bases: dict

A dictionary containing the observation of the environment.

Required fields:

removed_nodes (np.ndarray): Binary vector indicating removed nodes. edge_index (np.ndarray): Edge list in COO format.

Optional fields:

operations (np.ndarray): Matrix of operation features. jobs (np.ndarray): Matrix of job features. machines (np.ndarray): Matrix of machine features.

removed_nodes: ndarray
edge_index: ndarray
operations: ndarray
jobs: ndarray
machines: ndarray
add_padding(array, output_shape, padding_value=-1, dtype=None)[source]

Adds padding to the array.

Pads the input array to the specified output shape with a given padding value. If the dtype is not specified, the dtype of the input array is used.

Parameters:
  • array (ndarray[Any, dtype[Any]]) -- The input array to be padded.

  • output_shape (tuple[int, ...]) -- The desired shape of the output array.

  • padding_value (float) -- The value to use for padding. Defaults to -1.

  • dtype (type[T] | None) -- The data type for the output array. Defaults to None, in which case the dtype of the input array is used.

Returns:

The padded array with the specified output shape.

Raises:

ValidationError -- If the output shape is smaller than the input shape.

Return type:

ndarray[Any, dtype[T]]

Examples:

>>> array = np.array([[1, 2], [3, 4]])
>>> add_padding(array, (3, 3))
array([[ 1,  2, -1],
       [ 3,  4, -1],
       [-1, -1, -1]])

>>> add_padding(array, (3, 3), padding_value=0)
array([[1, 2, 0],
       [3, 4, 0],
       [0, 0, 0]])

>>> bool_array = np.array([[True, False], [False, True]])
>>> add_padding(bool_array, (3, 3), padding_value=False, dtype=int)
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 0]])

>>> add_padding(bool_array, (3, 3), dtype=int)
array([[ 1,  0, -1],
       [ 0,  1, -1],
       [-1, -1, -1]])
class MultiJobShopGraphEnv(instance_generator, feature_observer_configs, graph_initializer=<function build_agent_task_graph>, graph_updater_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.graphs.graph_updaters._residual_graph_updater.ResidualGraphUpdater'>, kwargs={}), ready_operations_filter=<function filter_dominated_operations>, reward_function_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.reinforcement_learning._reward_observers.MakespanReward'>, kwargs={}), render_mode=None, render_config=None, use_padding=True)[source]

Bases: Env

Gymnasium environment for solving multiple Job Shop Scheduling Problems using reinforcement learning and Graph Neural Networks.

This environment generates a new Job Shop Scheduling Problem instance for each reset, creates a graph representation, and manages the scheduling process using a Dispatcher.

The observation space includes:

  • removed_nodes: Binary vector indicating removed nodes.

  • edge_index: Edge list in COO format.

  • operations: Matrix of operation features.

  • jobs: Matrix of job features (if applicable).

  • machines: Matrix of machine features (if applicable).

Internally, the class creates a SingleJobShopGraphEnv environment to manage the scheduling process for each JobShopInstance.

Parameters:
instance_generator

A InstanceGenerator that generates a new problem instance on each reset.

action_space

gymnasium.spaces.Discrete) action space with size equal to the maximum number of jobs.

observation_space

Dictionary of observation spaces. Keys are defined in ObservationSpaceKey.

single_job_shop_graph_env

Environment for a specific Job Shop Scheduling Problem instance. See SingleJobShopGraphEnv.

graph_initializer

Function to create the initial graph representation. It should take a JobShopInstance as input and return a JobShopGraph.

render_mode

Rendering mode for visualization. Supported modes are: - human: Renders the current Gannt chart. - save_video: Saves a video of the Gantt chart. Used only if the

schedule is completed.

  • save_gif: Saves a GIF of the Gantt chart. Used only if the schedule is completed.

render_config

Configuration for rendering. See RenderConfig.

feature_observer_configs

List of DispatcherObserverConfig for feature observers.

reward_function_config

Configuration for the reward function. See DispatcherObserverConfig and RewardObserver.

graph_updater_config

Configuration for the graph updater. The graph updater is used to update the graph representation after each action. See DispatcherObserverConfig and GraphUpdater.

__init__(instance_generator, feature_observer_configs, graph_initializer=<function build_agent_task_graph>, graph_updater_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.graphs.graph_updaters._residual_graph_updater.ResidualGraphUpdater'>, kwargs={}), ready_operations_filter=<function filter_dominated_operations>, reward_function_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.reinforcement_learning._reward_observers.MakespanReward'>, kwargs={}), render_mode=None, render_config=None, use_padding=True)[source]

Initializes the environment.

Parameters:
  • instance_generator (InstanceGenerator) -- A InstanceGenerator that generates a new problem instance on each reset.

  • feature_observer_configs (Sequence[DispatcherObserverConfig[type[FeatureObserver]] | DispatcherObserverConfig[FeatureObserverType] | DispatcherObserverConfig[str]]) -- Configurations for feature observers. Each configuration should be a DispatcherObserverConfig with a class type that inherits from FeatureObserver or a string or enum that represents a built-in feature observer.

  • graph_initializer (Callable[[JobShopInstance], JobShopGraph]) -- Function to create the initial graph representation. If None, the default graph initializer is used: build_agent_task_graph().

  • graph_updater_config (DispatcherObserverConfig[type[GraphUpdater]]) -- Configuration for the graph updater. The graph updater is used to update the graph representation after each action. If None, the default graph updater is used: ResidualGraphUpdater.

  • ready_operations_filter (Callable[[Dispatcher, list[Operation]], list[Operation]]) -- Function to filter ready operations. If None, the default filter is used: filter_dominated_operations().

  • reward_function_config (DispatcherObserverConfig[type[RewardObserver]]) -- Configuration for the reward function. If None, the default reward function is used: MakespanReward.

  • render_mode (str | None) --

    Rendering mode for visualization. Supported modes are: - human: Renders the current Gannt chart. - save_video: Saves a video of the Gantt chart. Used only if

    the schedule is completed.

    • save_gif: Saves a GIF of the Gantt chart. Used only if the schedule is completed.

  • render_config (RenderConfig | None) -- Configuration for rendering. See RenderConfig.

  • use_padding (bool) -- Whether to use padding in observations. If True, all matrices are padded to fixed sizes based on the maximum instance size. Values are padded with -1, except for the "removed_nodes" key, which is padded with True, indicating that the node is removed.

Return type:

None

property dispatcher: Dispatcher

Returns the current dispatcher instance.

property reward_function: RewardObserver

Returns the current reward function instance.

property ready_operations_filter: Callable[[Dispatcher, list[Operation]], list[Operation]] | None

Returns the current ready operations filter.

property use_padding: bool

Returns whether the padding is used.

property job_shop_graph: JobShopGraph

Returns the current job shop graph.

property instance: JobShopInstance

Returns the current job shop instance.

reset(*, seed=None, options=None)[source]

Resets the environment and returns the initial observation.

Parameters:
  • seed (int | None) -- Random seed for reproducibility.

  • options (dict[str, Any] | None) -- Additional options for reset (currently unused).

Returns:

  • ObservationDict: The initial observation of the environment.

  • dict: An info dictionary containing additional information about the reset state. This may include details about the generated instance or initial graph structure.

Return type:

A tuple containing

step(action)[source]

Takes a step in the environment.

Parameters:

action (tuple[int, int]) -- The action to take. The action is a tuple of two integers (job_id, machine_id): the job ID and the machine ID in which to schedule the operation.

Returns:

  • The observation of the environment.

  • The reward obtained.

  • Whether the environment is done.

  • Whether the episode was truncated (always False).

  • A dictionary with additional information. The dictionary contains the following keys:

    • "feature_names": The names of the features in the observation.

    • "available_operations": The operations that are ready to be scheduled.

Return type:

A tuple containing the following elements

render()[source]

Compute the render frames as specified by render_mode during the initialization of the environment.

The environment's metadata render modes (env.metadata["render_modes"]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.

Note

As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

  • None (default): no render is computed.

  • "human": The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn't need to be called. Returns None.

  • "rgb_array": Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.

  • "ansi": Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).

  • "rgb_array_list" and "ansi_list": List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note

Make sure that your class's metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

Return type:

None