job_shop_lib.reinforcement_learning¶
Package for reinforcement learning components.
- class ObservationSpaceKey(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
str
,Enum
Enumeration of the keys for the observation space dictionary.
- REMOVED_NODES = 'removed_nodes'¶
- EDGE_INDEX = 'edge_index'¶
- OPERATIONS = 'operations'¶
- JOBS = 'jobs'¶
- MACHINES = 'machines'¶
- class RewardObserver(dispatcher, *, subscribe=True)[source]¶
Bases:
DispatcherObserver
Base class for all reward functions.
- Parameters:
dispatcher (Dispatcher)
subscribe (bool)
- rewards¶
List of rewards calculated for each operation scheduled by the dispatcher.
- property last_reward: float¶
Returns the reward of the last step or 0 if no rewards have been calculated.
- class MakespanReward(dispatcher, *, subscribe=True)[source]¶
Bases:
RewardObserver
Dense reward function based on the negative makespan of the schedule.
The reward is calculated as the difference between the makespan of the schedule before and after the last operation was scheduled. The makespan is the time at which the last operation is completed.
- Parameters:
dispatcher (Dispatcher)
- current_makespan¶
Makespan of the schedule after the last operation was scheduled.
- update(scheduled_operation)[source]¶
Called when an operation is scheduled on a machine.
- Parameters:
scheduled_operation (ScheduledOperation)
- class IdleTimeReward(dispatcher, *, subscribe=True)[source]¶
Bases:
RewardObserver
Dense reward function based on the negative idle time of the schedule.
The reward is calculated as the difference between the idle time of the schedule before and after the last operation was scheduled. The idle time is the sum of the time between the end of the last operation and the start of the next operation.
- Parameters:
dispatcher (Dispatcher)
subscribe (bool)
- update(scheduled_operation)[source]¶
Called when an operation is scheduled on a machine.
- Parameters:
scheduled_operation (ScheduledOperation)
- class GanttChartWrapperConfig[source]¶
Bases:
TypedDict
Configuration for creating the plot function with the plot_gantt_chart_wrapper function.
- title: str | None¶
- cmap: str¶
- show_available_operations: bool¶
- class GifConfig[source]¶
Bases:
dict
Configuration for creating the GIF using the create_gannt_chart_video function.
- gif_path: str | None¶
- fps: int¶
- remove_frames: bool¶
- frames_dir: str | None¶
- plot_current_time: bool¶
- class VideoConfig[source]¶
Bases:
TypedDict
Configuration for creating the video using the create_gannt_chart_video function.
- video_path: str | None¶
- fps: int¶
- remove_frames: bool¶
- frames_dir: str | None¶
- plot_current_time: bool¶
- class SingleJobShopGraphEnv(job_shop_graph, feature_observer_configs, reward_function_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.reinforcement_learning._reward_observers.MakespanReward'>, kwargs={}), graph_updater_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.graphs.graph_updaters._residual_graph_updater.ResidualGraphUpdater'>, kwargs={}), ready_operations_filter=<function filter_dominated_operations>, render_mode=None, render_config=None, use_padding=True)[source]¶
Bases:
Env
A Gymnasium environment for solving a specific instance of the Job Shop Scheduling Problem represented as a graph.
This environment manages the scheduling process for a single Job Shop instance, using a graph representation and various observers to track the state and compute rewards.
- Observation Space:
A dictionary with the following keys: - "removed_nodes": Binary vector indicating removed graph nodes. - "edge_list": Matrix of graph edges in COO format. - Feature matrices: Keys corresponding to the composite observer
features (e.g., "operations", "jobs", "machines").
- Action Space:
MultiDiscrete space representing (job_id, machine_id) pairs.
- Render Modes:
"human": Displays the current Gantt chart.
"save_video": Saves a video of the complete Gantt chart.
"save_gif": Saves a GIF of the complete Gantt chart.
- Parameters:
job_shop_graph (JobShopGraph)
feature_observer_configs (Sequence[DispatcherObserverConfig[type[FeatureObserver]] | DispatcherObserverConfig[FeatureObserverType] | DispatcherObserverConfig[str]])
reward_function_config (DispatcherObserverConfig[type[RewardObserver]])
graph_updater_config (DispatcherObserverConfig[type[GraphUpdater]])
ready_operations_filter (Callable[[Dispatcher, list[Operation]], list[Operation]] | None)
render_mode (str | None)
render_config (RenderConfig | None)
use_padding (bool)
- dispatcher¶
Manages the scheduling process. See
Dispatcher
.
- composite_observer¶
A
CompositeFeatureObserver
which aggregates features from multiple observers.
- graph_updater¶
Updates the graph representation after each action. See
GraphUpdater
.
- reward_function¶
Computes rewards for actions taken. See
RewardObserver
.
- action_space¶
Defines the action space. The action is a tuple of two integers (job_id, machine_id). The machine_id can be -1 if the selected operation can only be scheduled in one machine.
- observation_space¶
Defines the observation space. The observation is a dictionary with the following keys: - "removed_nodes": Binary vector indicating removed graph nodes. - "edge_list": Matrix of graph edges in COO format. - Feature matrices: Keys corresponding to the composite observer
features (e.g., "operations", "jobs", "machines").
- render_mode¶
The mode for rendering the environment ("human", "save_video", "save_gif").
- gantt_chart_creator¶
Creates Gantt chart visualizations. See
GanttChartCreator
.
- use_padding¶
Whether to use padding in observations. Padding maintains the
- metadata: dict[str, Any] = {'render_modes': ['human', 'save_video', 'save_gif']}¶
- __init__(job_shop_graph, feature_observer_configs, reward_function_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.reinforcement_learning._reward_observers.MakespanReward'>, kwargs={}), graph_updater_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.graphs.graph_updaters._residual_graph_updater.ResidualGraphUpdater'>, kwargs={}), ready_operations_filter=<function filter_dominated_operations>, render_mode=None, render_config=None, use_padding=True)[source]¶
Initializes the SingleJobShopGraphEnv environment.
- Parameters:
job_shop_graph (JobShopGraph) -- The JobShopGraph instance representing the job shop problem.
feature_observer_configs (Sequence[DispatcherObserverConfig[type[FeatureObserver]] | DispatcherObserverConfig[FeatureObserverType] | DispatcherObserverConfig[str]]) -- A list of FeatureObserverConfig instances for the feature observers.
reward_function_config (DispatcherObserverConfig[type[RewardObserver]]) -- The configuration for the reward function.
graph_updater_config (DispatcherObserverConfig[type[GraphUpdater]]) -- The configuration for the graph updater.
ready_operations_filter (Callable[[Dispatcher, list[Operation]], list[Operation]] | None) -- The function to use for pruning dominated operations.
render_mode (str | None) -- The mode for rendering the environment ("human", "save_video", "save_gif").
render_config (RenderConfig | None) -- Configuration for rendering (e.g., paths for saving videos or GIFs).
use_padding (bool) -- Whether to use padding for the edge index.
- Return type:
None
- property instance: JobShopInstance¶
Returns the instance the environment is working on.
- property job_shop_graph: JobShopGraph¶
Returns the job shop graph.
- reset(*, seed=None, options=None)[source]¶
Resets the environment.
- Parameters:
seed (int | None)
options (dict[str, Any] | None)
- Return type:
tuple[ObservationDict, dict]
- step(action)[source]¶
Takes a step in the environment.
- Parameters:
action (tuple[int, int]) -- The action to take. The action is a tuple of two integers (job_id, machine_id): the job ID and the machine ID in which to schedule the operation.
- Returns:
The observation of the environment.
The reward obtained.
Whether the environment is done.
Whether the episode was truncated (always False).
A dictionary with additional information. The dictionary contains the following keys:
"feature_names": The names of the features in the observation.
"available_operations": The operations that are ready to be scheduled.
- Return type:
A tuple containing the following elements
- render()[source]¶
Renders the environment.
The rendering mode is set by the render_mode attribute:
human: Renders the current Gannt chart.
- save_video: Saves a video of the Gantt chart. Used only if the
schedule is completed.
- save_gif: Saves a GIF of the Gantt chart. Used only if the schedule
is completed.
- class RenderConfig[source]¶
Bases:
TypedDict
Configuration needed to initialize the GanttChartCreator class.
- gantt_chart_wrapper_config: GanttChartWrapperConfig¶
- video_config: VideoConfig¶
- class ObservationDict[source]¶
Bases:
dict
A dictionary containing the observation of the environment.
- Required fields:
removed_nodes (np.ndarray): Binary vector indicating removed nodes. edge_index (np.ndarray): Edge list in COO format.
- Optional fields:
operations (np.ndarray): Matrix of operation features. jobs (np.ndarray): Matrix of job features. machines (np.ndarray): Matrix of machine features.
- removed_nodes: ndarray¶
- edge_index: ndarray¶
- operations: ndarray¶
- jobs: ndarray¶
- machines: ndarray¶
- add_padding(array, output_shape, padding_value=-1, dtype=None)[source]¶
Adds padding to the array.
Pads the input array to the specified output shape with a given padding value. If the
dtype
is not specified, thedtype
of the input array is used.- Parameters:
array (ndarray[Any, dtype[Any]]) -- The input array to be padded.
output_shape (tuple[int, ...]) -- The desired shape of the output array.
padding_value (float) -- The value to use for padding. Defaults to -1.
dtype (type[T] | None) -- The data type for the output array. Defaults to
None
, in which case the dtype of the input array is used.
- Returns:
The padded array with the specified output shape.
- Raises:
ValidationError -- If the output shape is smaller than the input shape.
- Return type:
ndarray[Any, dtype[T]]
Examples:
>>> array = np.array([[1, 2], [3, 4]]) >>> add_padding(array, (3, 3)) array([[ 1, 2, -1], [ 3, 4, -1], [-1, -1, -1]]) >>> add_padding(array, (3, 3), padding_value=0) array([[1, 2, 0], [3, 4, 0], [0, 0, 0]]) >>> bool_array = np.array([[True, False], [False, True]]) >>> add_padding(bool_array, (3, 3), padding_value=False, dtype=int) array([[1, 0, 0], [0, 1, 0], [0, 0, 0]]) >>> add_padding(bool_array, (3, 3), dtype=int) array([[ 1, 0, -1], [ 0, 1, -1], [-1, -1, -1]])
- class MultiJobShopGraphEnv(instance_generator, feature_observer_configs, graph_initializer=<function build_agent_task_graph>, graph_updater_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.graphs.graph_updaters._residual_graph_updater.ResidualGraphUpdater'>, kwargs={}), ready_operations_filter=<function filter_dominated_operations>, reward_function_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.reinforcement_learning._reward_observers.MakespanReward'>, kwargs={}), render_mode=None, render_config=None, use_padding=True)[source]¶
Bases:
Env
Gymnasium environment for solving multiple Job Shop Scheduling Problems using reinforcement learning and Graph Neural Networks.
This environment generates a new Job Shop Scheduling Problem instance for each reset, creates a graph representation, and manages the scheduling process using a
Dispatcher
.The observation space includes:
removed_nodes: Binary vector indicating removed nodes.
edge_index: Edge list in COO format.
operations: Matrix of operation features.
jobs: Matrix of job features (if applicable).
machines: Matrix of machine features (if applicable).
Internally, the class creates a
SingleJobShopGraphEnv
environment to manage the scheduling process for eachJobShopInstance
.- Parameters:
instance_generator (InstanceGenerator)
feature_observer_configs (Sequence[DispatcherObserverConfig[type[FeatureObserver]] | DispatcherObserverConfig[FeatureObserverType] | DispatcherObserverConfig[str]])
graph_initializer (Callable[[JobShopInstance], JobShopGraph])
graph_updater_config (DispatcherObserverConfig[type[GraphUpdater]])
ready_operations_filter (Callable[[Dispatcher, list[Operation]], list[Operation]])
reward_function_config (DispatcherObserverConfig[type[RewardObserver]])
render_mode (str | None)
render_config (RenderConfig | None)
use_padding (bool)
- instance_generator¶
A
InstanceGenerator
that generates a new problem instance on each reset.
- action_space¶
gymnasium.spaces.Discrete
) action space with size equal to the maximum number of jobs.
- observation_space¶
Dictionary of observation spaces. Keys are defined in
ObservationSpaceKey
.
- single_job_shop_graph_env¶
Environment for a specific Job Shop Scheduling Problem instance. See
SingleJobShopGraphEnv
.
- graph_initializer¶
Function to create the initial graph representation. It should take a
JobShopInstance
as input and return aJobShopGraph
.
- render_mode¶
Rendering mode for visualization. Supported modes are: - human: Renders the current Gannt chart. - save_video: Saves a video of the Gantt chart. Used only if the
schedule is completed.
save_gif: Saves a GIF of the Gantt chart. Used only if the schedule is completed.
- render_config¶
Configuration for rendering. See
RenderConfig
.
- feature_observer_configs¶
List of
DispatcherObserverConfig
for feature observers.
- reward_function_config¶
Configuration for the reward function. See
DispatcherObserverConfig
andRewardObserver
.
- graph_updater_config¶
Configuration for the graph updater. The graph updater is used to update the graph representation after each action. See
DispatcherObserverConfig
andGraphUpdater
.
- __init__(instance_generator, feature_observer_configs, graph_initializer=<function build_agent_task_graph>, graph_updater_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.graphs.graph_updaters._residual_graph_updater.ResidualGraphUpdater'>, kwargs={}), ready_operations_filter=<function filter_dominated_operations>, reward_function_config=DispatcherObserverConfig(class_type=<class 'job_shop_lib.reinforcement_learning._reward_observers.MakespanReward'>, kwargs={}), render_mode=None, render_config=None, use_padding=True)[source]¶
Initializes the environment.
- Parameters:
instance_generator (InstanceGenerator) -- A
InstanceGenerator
that generates a new problem instance on each reset.feature_observer_configs (Sequence[DispatcherObserverConfig[type[FeatureObserver]] | DispatcherObserverConfig[FeatureObserverType] | DispatcherObserverConfig[str]]) -- Configurations for feature observers. Each configuration should be a
DispatcherObserverConfig
with a class type that inherits fromFeatureObserver
or a string or enum that represents a built-in feature observer.graph_initializer (Callable[[JobShopInstance], JobShopGraph]) -- Function to create the initial graph representation. If
None
, the default graph initializer is used:build_agent_task_graph()
.graph_updater_config (DispatcherObserverConfig[type[GraphUpdater]]) -- Configuration for the graph updater. The graph updater is used to update the graph representation after each action. If
None
, the default graph updater is used:ResidualGraphUpdater
.ready_operations_filter (Callable[[Dispatcher, list[Operation]], list[Operation]]) -- Function to filter ready operations. If
None
, the default filter is used:filter_dominated_operations()
.reward_function_config (DispatcherObserverConfig[type[RewardObserver]]) -- Configuration for the reward function. If
None
, the default reward function is used:MakespanReward
.render_mode (str | None) --
Rendering mode for visualization. Supported modes are: - human: Renders the current Gannt chart. - save_video: Saves a video of the Gantt chart. Used only if
the schedule is completed.
save_gif: Saves a GIF of the Gantt chart. Used only if the schedule is completed.
render_config (RenderConfig | None) -- Configuration for rendering. See
RenderConfig
.use_padding (bool) -- Whether to use padding in observations. If True, all matrices are padded to fixed sizes based on the maximum instance size. Values are padded with -1, except for the "removed_nodes" key, which is padded with
True
, indicating that the node is removed.
- Return type:
None
- property dispatcher: Dispatcher¶
Returns the current dispatcher instance.
- property reward_function: RewardObserver¶
Returns the current reward function instance.
- property ready_operations_filter: Callable[[Dispatcher, list[Operation]], list[Operation]] | None¶
Returns the current ready operations filter.
- property use_padding: bool¶
Returns whether the padding is used.
- property job_shop_graph: JobShopGraph¶
Returns the current job shop graph.
- property instance: JobShopInstance¶
Returns the current job shop instance.
- reset(*, seed=None, options=None)[source]¶
Resets the environment and returns the initial observation.
- Parameters:
seed (int | None) -- Random seed for reproducibility.
options (dict[str, Any] | None) -- Additional options for reset (currently unused).
- Returns:
ObservationDict: The initial observation of the environment.
dict: An info dictionary containing additional information about the reset state. This may include details about the generated instance or initial graph structure.
- Return type:
A tuple containing
- step(action)[source]¶
Takes a step in the environment.
- Parameters:
action (tuple[int, int]) -- The action to take. The action is a tuple of two integers (job_id, machine_id): the job ID and the machine ID in which to schedule the operation.
- Returns:
The observation of the environment.
The reward obtained.
Whether the environment is done.
Whether the episode was truncated (always False).
A dictionary with additional information. The dictionary contains the following keys:
"feature_names": The names of the features in the observation.
"available_operations": The operations that are ready to be scheduled.
- Return type:
A tuple containing the following elements
- render()[source]¶
Compute the render frames as specified by
render_mode
during the initialization of the environment.The environment's
metadata
render modes (env.metadata["render_modes"]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.Note
As the
render_mode
is known during__init__
, the objects used to render the environment state should be initialised in__init__
.By convention, if the
render_mode
is:None (default): no render is computed.
"human": The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during
step()
andrender()
doesn't need to be called. ReturnsNone
."rgb_array": Return a single frame representing the current state of the environment. A frame is a
np.ndarray
with shape(x, y, 3)
representing RGB values for an x-by-y pixel image."ansi": Return a strings (
str
) orStringIO.StringIO
containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors)."rgb_array_list" and "ansi_list": List based version of render modes are possible (except Human) through the wrapper,
gymnasium.wrappers.RenderCollection
that is automatically applied duringgymnasium.make(..., render_mode="rgb_array_list")
. The frames collected are popped afterrender()
is called orreset()
.
Note
Make sure that your class's
metadata
"render_modes"
key includes the list of supported modes.Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e.,
gymnasium.make("CartPole-v1", render_mode="human")
- Return type:
None