728x90
준비
해당 예제는 아래 링크에 있는 Hummingbirds 프로젝트를 따라가며, 최신 ML-Agents 버전에 맞게 프로젝트를 수정하고 중요한 포인트를 학습하는 방식으로 진행하겠습니다.
ML-Agents: Hummingbirds - Unity Learn
Reinforcement Learning is one of the most exciting types of Artificial Intelligence and the Unity ML-Agents project is one of the easiest and most fun ways to get started. The instructor for the course is Adam Kelly, an experienced Unity developer who has
learn.unity.com
오류
위 예제를 따라하다 보면 아래와 같은 오류들을 마주하게 됩니다. 여기서는 이 오류들을 해결하는 방법을 적어 나가겠습니다.
1. 강의 에서 제공되는 프로젝트와 코드를 넣을 경우 아래와 같이 오류가 뜨는데 ML Agent 를 낮은 버전에서 제작되어서 생긴 이슈 입니다. 아래와 같이 고치면 해결 됩니다.
using Unity.MLAgents.Actuators;
public override void OnActionReceived(ActionBuffers vectorAction)
{
// Don't take actions if frozen
if (frozen) return;
// Calculate movement vector
Vector3 move = new(vectorAction.ContinuousActions[0], vectorAction.ContinuousActions[1], vectorAction.ContinuousActions[2]);
// Add force in the direction of the move vector
rigidbody.AddForce(move * moveForce);
// Get the current rotation
Vector3 rotationVector = transform.rotation.eulerAngles;
// Calculate pitch and yaw rotation
float pitchChange = vectorAction.ContinuousActions[3];
float yawChange = vectorAction.ContinuousActions[4];
// Calculate smooth rotation changes
smoothPitchChange = Mathf.MoveTowards(smoothPitchChange, pitchChange, 2f * Time.fixedDeltaTime);
smoothYawChange = Mathf.MoveTowards(smoothYawChange, yawChange, 2f * Time.fixedDeltaTime);
// Calculate new pitch and yaw based on smoothed values
// Clamp pitch to avoid flipping upside down
float pitch = rotationVector.x + smoothPitchChange * Time.fixedDeltaTime * pitchSpeed;
if (pitch > 180f) pitch -= 360f;
pitch = Mathf.Clamp(pitch, -MaxPitchAngle, MaxPitchAngle);
float yaw = rotationVector.y + smoothYawChange * Time.fixedDeltaTime * yawSpeed;
// Apply the new rotation
transform.rotation = Quaternion.Euler(pitch, yaw, 0f);
}
public override void Heuristic(in ActionBuffers actionsOut)
{
// Create placeholders for all movement/turning
Vector3 forward = Vector3.zero;
Vector3 left = Vector3.zero;
Vector3 up = Vector3.zero;
float pitch = 0f;
float yaw = 0f;
// Convert keyboard inputs to movement and turning
// All values should be between -1 and +1
// Forward/backward
if (Input.GetKey(KeyCode.W)) forward = transform.forward;
else if (Input.GetKey(KeyCode.S)) forward = -transform.forward;
// Left/right
if (Input.GetKey(KeyCode.A)) left = -transform.right;
else if (Input.GetKey(KeyCode.D)) left = transform.right;
// Up/down
if (Input.GetKey(KeyCode.E)) up = transform.up;
else if (Input.GetKey(KeyCode.C)) up = -transform.up;
// Pitch up/down
if (Input.GetKey(KeyCode.UpArrow)) pitch = 1f;
else if (Input.GetKey(KeyCode.DownArrow)) pitch = -1f;
// Turn left/right
if (Input.GetKey(KeyCode.LeftArrow)) yaw = -1f;
else if (Input.GetKey(KeyCode.RightArrow)) yaw = 1f;
// Combine the movement vectors and normalize
Vector3 combined = (forward + left + up).normalized;
// Add the 3 movement values, pitch, and yaw to the actionsOut array
var continuousActionsOut = actionsOut.ContinuousActions;
continuousActionsOut[0] = combined.x;
continuousActionsOut[1] = combined.y;
continuousActionsOut[2] = combined.z;
continuousActionsOut[3] = pitch;
continuousActionsOut[4] = yaw;
}
해당 함수들을 이렇게 바꿔 주시면 됩니다.
2. 훈련 파일도 위와 같은 이유로 최신 버전에서는 형식이 맞지 않습니다. 아래와 같이 바꿔 주세요.
default_settings: null
behaviors:
Hummingbird:
trainer_type: ppo
hyperparameters:
batch_size: 2048
buffer_size: 20480
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
shared_critic: false
learning_rate_schedule: linear
beta_schedule: linear
epsilon_schedule: linear
checkpoint_interval: 5000000
network_settings:
normalize: false
hidden_units: 256
num_layers: 2
vis_encode_type: simple
memory: null
goal_conditioning_type: hyper
deterministic: false
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
network_settings:
normalize: false
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: null
goal_conditioning_type: hyper
deterministic: false
init_path: null
keep_checkpoints: 5
even_checkpoints: false
max_steps: 5000000
time_horizon: 128
summary_freq: 50000
threaded: false
self_play: null
behavioral_cloning: null
env_settings:
env_path: null
env_args: null
base_port: 5005
num_envs: 1
num_areas: 1
timeout_wait: 60
seed: -1
max_lifetime_restarts: 10
restarts_rate_limit_n: 1
restarts_rate_limit_period_s: 60
engine_settings:
width: 84
height: 84
quality_level: 5
time_scale: 20
target_frame_rate: -1
capture_frame_rate: 60
no_graphics: false
no_graphics_monitor: false
environment_parameters: null
728x90