job_shop_lib.reinforcement_learning¶

Package for reinforcement learning components.

class ObservationSpaceKey(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: str, Enum

Enumeration of the keys for the observation space dictionary.

REMOVED_NODES = 'removed_nodes'¶

EDGE_INDEX = 'edge_index'¶

OPERATIONS = 'operations'¶

JOBS = 'jobs'¶

MACHINES = 'machines'¶

class RewardObserver(dispatcher, *, subscribe=True)[source]¶

Bases: DispatcherObserver

Base class for all reward functions.

Parameters:

dispatcher (Dispatcher)
subscribe (bool)

rewards¶: List of rewards calculated for each operation scheduled by the dispatcher.

property last_reward: float¶: Returns the reward of the last step or 0 if no rewards have been calculated.

reset()[source]¶

Sets rewards attribute to a new empty list.

Return type:: None

class MakespanReward(dispatcher, *, subscribe=True)[source]¶

Bases: RewardObserver

Dense reward function based on the negative makespan of the schedule.

The reward is calculated as the difference between the makespan of the schedule before and after the last operation was scheduled. The makespan is the time at which the last operation is completed.

Parameters:: dispatcher (Dispatcher)

current_makespan¶: Makespan of the schedule after the last operation was scheduled.

reset()[source]¶

Sets rewards attribute to a new empty list.

Return type:: None

update(scheduled_operation)[source]¶

Called when an operation is scheduled on a machine.

Parameters:: scheduled_operation (ScheduledOperation)

class IdleTimeReward(dispatcher, *, subscribe=True)[source]¶

Bases: RewardObserver

Dense reward function based on the negative idle time of the schedule.

The reward is calculated as the difference between the idle time of the schedule before and after the last operation was scheduled. The idle time is the sum of the time between the end of the last operation and the start of the next operation.

Parameters:

dispatcher (Dispatcher)
subscribe (bool)

update(scheduled_operation)[source]¶

Called when an operation is scheduled on a machine.

Parameters:: scheduled_operation (ScheduledOperation)

class GanttChartWrapperConfig[source]¶

Bases: TypedDict

Configuration for creating the plot function with the plot_gantt_chart_wrapper function.

title: str | None¶

cmap: str¶

show_available_operations: bool¶

class GifConfig[source]¶

Bases: dict

Configuration for creating the GIF using the create_gannt_chart_video function.

gif_path: str | None¶

fps: int¶

remove_frames: bool¶

frames_dir: str | None¶

plot_current_time: bool¶

class VideoConfig[source]¶

Bases: TypedDict

Configuration for creating the video using the create_gannt_chart_video function.

video_path: str | None¶

fps: int¶

remove_frames: bool¶

frames_dir: str | None¶

plot_current_time: bool¶

class SingleJobShopGraphEnv(job_shop_graph, feature_observer_configs, reward_function_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.reinforcement_learning._reward_observers.MakespanReward'>, kwargs={}), graph_updater_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.graphs.graph_updaters._residual_graph_updater.ResidualGraphUpdater'>, kwargs={}), ready_operations_filter=<function filter_dominated_operations>, render_mode=None, render_config=None, use_padding=True)[source]¶

Bases: Env

A Gymnasium environment for solving a specific instance of the Job Shop Scheduling Problem represented as a graph.

This environment manages the scheduling process for a single Job Shop instance, using a graph representation and various observers to track the state and compute rewards.

Observation Space:

A dictionary with the following keys: - "removed_nodes": Binary vector indicating removed graph nodes. - "edge_list": Matrix of graph edges in COO format. - Feature matrices: Keys corresponding to the composite observer

features (e.g., "operations", "jobs", "machines").

Action Space:

MultiDiscrete space representing (job_id, machine_id) pairs.

Render Modes:

"human": Displays the current Gantt chart.
"save_video": Saves a video of the complete Gantt chart.
"save_gif": Saves a GIF of the complete Gantt chart.

Parameters:

job_shop_graph (JobShopGraph)
feature_observer_configs (Sequence[DispatcherObserverConfig[type[FeatureObserver]] | DispatcherObserverConfig[FeatureObserverType] | DispatcherObserverConfig[str]])
reward_function_config (DispatcherObserverConfig[type[RewardObserver]])
graph_updater_config (DispatcherObserverConfig[type[GraphUpdater]])
ready_operations_filter (Callable[[Dispatcher, list[Operation]], list[Operation]] | None)
render_mode (str | None)
render_config (RenderConfig | None)
use_padding (bool)

dispatcher¶: Manages the scheduling process. See Dispatcher.

composite_observer¶: A CompositeFeatureObserver which aggregates features from multiple observers.

graph_updater¶: Updates the graph representation after each action. See GraphUpdater.

reward_function¶: Computes rewards for actions taken. See RewardObserver.

action_space¶: Defines the action space. The action is a tuple of two integers (job_id, machine_id). The machine_id can be -1 if the selected operation can only be scheduled in one machine.

observation_space¶: Defines the observation space. The observation is a dictionary with the following keys: - "removed_nodes": Binary vector indicating removed graph nodes. - "edge_list": Matrix of graph edges in COO format. - Feature matrices: Keys corresponding to the composite observer

features (e.g., "operations", "jobs", "machines").

render_mode¶: The mode for rendering the environment ("human", "save_video", "save_gif").

gantt_chart_creator¶: Creates Gantt chart visualizations. See GanttChartCreator.

use_padding¶: Whether to use padding in observations. Padding maintains the

metadata: dict[str, Any] = {'render_modes': ['human', 'save_video', 'save_gif']}¶

__init__(job_shop_graph, feature_observer_configs, reward_function_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.reinforcement_learning._reward_observers.MakespanReward'>, kwargs={}), graph_updater_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.graphs.graph_updaters._residual_graph_updater.ResidualGraphUpdater'>, kwargs={}), ready_operations_filter=<function filter_dominated_operations>, render_mode=None, render_config=None, use_padding=True)[source]¶

Initializes the SingleJobShopGraphEnv environment.

Parameters:

job_shop_graph (JobShopGraph) -- The JobShopGraph instance representing the job shop problem.
feature_observer_configs (Sequence[DispatcherObserverConfig[type[FeatureObserver]] | DispatcherObserverConfig[FeatureObserverType] | DispatcherObserverConfig[str]]) -- A list of FeatureObserverConfig instances for the feature observers.
reward_function_config (DispatcherObserverConfig[type[RewardObserver]]) -- The configuration for the reward function.
graph_updater_config (DispatcherObserverConfig[type[GraphUpdater]]) -- The configuration for the graph updater.
ready_operations_filter (Callable[[Dispatcher, list[Operation]], list[Operation]] | None) -- The function to use for pruning dominated operations.
render_mode (str | None) -- The mode for rendering the environment ("human", "save_video", "save_gif").
render_config (RenderConfig | None) -- Configuration for rendering (e.g., paths for saving videos or GIFs).
use_padding (bool) -- Whether to use padding for the edge index.

Return type:

None

property instance: JobShopInstance¶: Returns the instance the environment is working on.

property job_shop_graph: JobShopGraph¶: Returns the job shop graph.

reset(*, seed=None, options=None)[source]¶

Resets the environment.

Parameters:

seed (int | None)
options (dict[str, Any] | None)

Return type:

tuple[ObservationDict, dict]

step(action)[source]¶

Takes a step in the environment.

Parameters:

action (tuple[int, int]) -- The action to take. The action is a tuple of two integers (job_id, machine_id): the job ID and the machine ID in which to schedule the operation.

Returns:

The observation of the environment.
The reward obtained.
Whether the environment is done.
Whether the episode was truncated (always False).
A dictionary with additional information. The dictionary contains the following keys:
- "feature_names": The names of the features in the observation.
- "available_operations": The operations that are ready to be scheduled.

Return type:

A tuple containing the following elements

get_observation()[source]¶

Returns the current observation of the environment.

Return type:: ObservationDict

render()[source]¶

Renders the environment.

The rendering mode is set by the render_mode attribute:

human: Renders the current Gannt chart.
save_video: Saves a video of the Gantt chart. Used only if the
schedule is completed.
save_gif: Saves a GIF of the Gantt chart. Used only if the schedule
is completed.

class RenderConfig[source]¶

Bases: TypedDict

Configuration needed to initialize the GanttChartCreator class.

gantt_chart_wrapper_config: GanttChartWrapperConfig¶

video_config: VideoConfig¶

gif_config: GifConfig¶

class ObservationDict[source]¶

Bases: dict

A dictionary containing the observation of the environment.

Required fields:: removed_nodes (np.ndarray): Binary vector indicating removed nodes. edge_index (np.ndarray): Edge list in COO format.
Optional fields:: operations (np.ndarray): Matrix of operation features. jobs (np.ndarray): Matrix of job features. machines (np.ndarray): Matrix of machine features.

removed_nodes: ndarray¶

edge_index: ndarray¶

operations: ndarray¶

jobs: ndarray¶

machines: ndarray¶

add_padding(array, output_shape, padding_value=-1, dtype=None)[source]¶

Adds padding to the array.

Pads the input array to the specified output shape with a given padding value. If the dtype is not specified, the dtype of the input array is used.

Parameters:

array (ndarray[Any, dtype[Any]]) -- The input array to be padded.
output_shape (tuple[int, ...]) -- The desired shape of the output array.
padding_value (float) -- The value to use for padding. Defaults to -1.
dtype (type[T] | None) -- The data type for the output array. Defaults to None, in which case the dtype of the input array is used.

Returns:

The padded array with the specified output shape.

Raises:

ValidationError -- If the output shape is smaller than the input shape.

Return type:

ndarray[Any, dtype[T]]

Examples:

>>> array = np.array([[1, 2], [3, 4]])
>>> add_padding(array, (3, 3))
array([[ 1,  2, -1],
       [ 3,  4, -1],
       [-1, -1, -1]])

>>> add_padding(array, (3, 3), padding_value=0)
array([[1, 2, 0],
       [3, 4, 0],
       [0, 0, 0]])

>>> bool_array = np.array([[True, False], [False, True]])
>>> add_padding(bool_array, (3, 3), padding_value=False, dtype=int)
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 0]])

>>> add_padding(bool_array, (3, 3), dtype=int)
array([[ 1,  0, -1],
       [ 0,  1, -1],
       [-1, -1, -1]])

class MultiJobShopGraphEnv(instance_generator, feature_observer_configs, graph_initializer=<function build_agent_task_graph>, graph_updater_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.graphs.graph_updaters._residual_graph_updater.ResidualGraphUpdater'>, kwargs={}), ready_operations_filter=<function filter_dominated_operations>, reward_function_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.reinforcement_learning._reward_observers.MakespanReward'>, kwargs={}), render_mode=None, render_config=None, use_padding=True)[source]¶

Bases: Env

Gymnasium environment for solving multiple Job Shop Scheduling Problems using reinforcement learning and Graph Neural Networks.

This environment generates a new Job Shop Scheduling Problem instance for each reset, creates a graph representation, and manages the scheduling process using a Dispatcher.

The observation space includes:

removed_nodes: Binary vector indicating removed nodes.

edge_index: Edge list in COO format.

operations: Matrix of operation features.

jobs: Matrix of job features (if applicable).

machines: Matrix of machine features (if applicable).

Internally, the class creates a SingleJobShopGraphEnv environment to manage the scheduling process for each JobShopInstance.

Parameters:

instance_generator (InstanceGenerator)
feature_observer_configs (Sequence[DispatcherObserverConfig[type[FeatureObserver]] | DispatcherObserverConfig[FeatureObserverType] | DispatcherObserverConfig[str]])
graph_initializer (Callable[[JobShopInstance], JobShopGraph])
graph_updater_config (DispatcherObserverConfig[type[GraphUpdater]])
ready_operations_filter (Callable[[Dispatcher, list[Operation]], list[Operation]])
reward_function_config (DispatcherObserverConfig[type[RewardObserver]])
render_mode (str | None)
render_config (RenderConfig | None)
use_padding (bool)

instance_generator¶: A InstanceGenerator that generates a new problem instance on each reset.

action_space¶: gymnasium.spaces.Discrete) action space with size equal to the maximum number of jobs.

observation_space¶: Dictionary of observation spaces. Keys are defined in ObservationSpaceKey.

single_job_shop_graph_env¶: Environment for a specific Job Shop Scheduling Problem instance. See SingleJobShopGraphEnv.

graph_initializer¶: Function to create the initial graph representation. It should take a JobShopInstance as input and return a JobShopGraph.

render_mode¶

Rendering mode for visualization. Supported modes are: - human: Renders the current Gannt chart. - save_video: Saves a video of the Gantt chart. Used only if the

schedule is completed.

save_gif: Saves a GIF of the Gantt chart. Used only if the schedule is completed.

render_config¶: Configuration for rendering. See RenderConfig.

feature_observer_configs¶: List of DispatcherObserverConfig for feature observers.

reward_function_config¶: Configuration for the reward function. See DispatcherObserverConfig and RewardObserver.

graph_updater_config¶: Configuration for the graph updater. The graph updater is used to update the graph representation after each action. See DispatcherObserverConfig and GraphUpdater.

__init__(instance_generator, feature_observer_configs, graph_initializer=<function build_agent_task_graph>, graph_updater_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.graphs.graph_updaters._residual_graph_updater.ResidualGraphUpdater'>, kwargs={}), ready_operations_filter=<function filter_dominated_operations>, reward_function_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.reinforcement_learning._reward_observers.MakespanReward'>, kwargs={}), render_mode=None, render_config=None, use_padding=True)[source]¶

Initializes the environment.

Parameters:

instance_generator (InstanceGenerator) -- A InstanceGenerator that generates a new problem instance on each reset.
feature_observer_configs (Sequence[DispatcherObserverConfig[type[FeatureObserver]] | DispatcherObserverConfig[FeatureObserverType] | DispatcherObserverConfig[str]]) -- Configurations for feature observers. Each configuration should be a DispatcherObserverConfig with a class type that inherits from FeatureObserver or a string or enum that represents a built-in feature observer.
graph_initializer (Callable[[JobShopInstance], JobShopGraph]) -- Function to create the initial graph representation. If None, the default graph initializer is used: build_agent_task_graph().
graph_updater_config (DispatcherObserverConfig[type[GraphUpdater]]) -- Configuration for the graph updater. The graph updater is used to update the graph representation after each action. If None, the default graph updater is used: ResidualGraphUpdater.
ready_operations_filter (Callable[[Dispatcher, list[Operation]], list[Operation]]) -- Function to filter ready operations. If None, the default filter is used: filter_dominated_operations().
reward_function_config (DispatcherObserverConfig[type[RewardObserver]]) -- Configuration for the reward function. If None, the default reward function is used: MakespanReward.
render_mode (str | None) --
Rendering mode for visualization. Supported modes are: - human: Renders the current Gannt chart. - save_video: Saves a video of the Gantt chart. Used only if

the schedule is completed.
- save_gif: Saves a GIF of the Gantt chart. Used only if the schedule is completed.
render_config (RenderConfig | None) -- Configuration for rendering. See RenderConfig.
use_padding (bool) -- Whether to use padding in observations. If True, all matrices are padded to fixed sizes based on the maximum instance size. Values are padded with -1, except for the "removed_nodes" key, which is padded with True, indicating that the node is removed.

Return type:

None

property dispatcher: Dispatcher¶: Returns the current dispatcher instance.

property reward_function: RewardObserver¶: Returns the current reward function instance.

property ready_operations_filter: Callable[[Dispatcher, list[Operation]], list[Operation]] | None¶: Returns the current ready operations filter.

property use_padding: bool¶: Returns whether the padding is used.

property job_shop_graph: JobShopGraph¶: Returns the current job shop graph.

property instance: JobShopInstance¶: Returns the current job shop instance.

reset(*, seed=None, options=None)[source]¶

Resets the environment and returns the initial observation.

Parameters:

seed (int | None) -- Random seed for reproducibility.
options (dict[str, Any] | None) -- Additional options for reset (currently unused).

Returns:

ObservationDict: The initial observation of the environment.
dict: An info dictionary containing additional information about the reset state. This may include details about the generated instance or initial graph structure.

Return type:

A tuple containing

step(action)[source]¶

Takes a step in the environment.

Parameters:

action (tuple[int, int]) -- The action to take. The action is a tuple of two integers (job_id, machine_id): the job ID and the machine ID in which to schedule the operation.

Returns:

The observation of the environment.
The reward obtained.
Whether the environment is done.
Whether the episode was truncated (always False).
A dictionary with additional information. The dictionary contains the following keys:
- "feature_names": The names of the features in the observation.
- "available_operations": The operations that are ready to be scheduled.

Return type:

A tuple containing the following elements

render()[source]¶

Compute the render frames as specified by render_mode during the initialization of the environment.

The environment's metadata render modes (env.metadata["render_modes"]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.

Note

As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

None (default): no render is computed.
"human": The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn't need to be called. Returns None.
"rgb_array": Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.
"ansi": Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).
"rgb_array_list" and "ansi_list": List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note

Make sure that your class's metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

Return type:: None