當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

openai_ros教程（ ros gazebo 深度强化学习）

發布時間：2024/8/1 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 openai_ros教程（ ros gazebo 深度强化学习）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、環境搭建

測試環境：ubuntu16.04，kinetic

下載openai_ros

git clone https://bitbucket.org/theconstructcore/openai_ros.git

openai_ros相關的依賴：

message_runtime rospy gazebo_msgs std_msgs geometry_msgs controller_manager_msgs

?例如你沒有相關依賴就會報錯：

安裝controller_manager_msgs：

sudo apt-get install ros-kinetic-controller-manager-msgs

相關例子：?

二、用openai_ros創建一個TurtleBot訓練腳本

在catkin_ws / src目錄下創建一個my_turtlebot2_training包：

再創建一個start_training.py的Python文件，并添加可執行權限 chmod +x：

#!/usr/bin/env pythonimport gym import numpy import time import qlearn from gym import wrappers # ROS packages required import rospy import rospkg from openai_ros.openai_ros_common import StartOpenAI_ROS_Environmentif __name__ == '__main__':rospy.init_node('example_turtlebot2_maze_qlearn',anonymous=True, log_level=rospy.WARN)# Init OpenAI_ROS ENVtask_and_robot_environment_name = rospy.get_param('/turtlebot2/task_and_robot_environment_name')env = StartOpenAI_ROS_Environment(task_and_robot_environment_name)# Create the Gym environmentrospy.loginfo("Gym environment done")rospy.loginfo("Starting Learning")# Set the logging systemrospack = rospkg.RosPack()pkg_path = rospack.get_path('turtle2_openai_ros_example')outdir = pkg_path + '/training_results'env = wrappers.Monitor(env, outdir, force=True)rospy.loginfo("Monitor Wrapper started")last_time_steps = numpy.ndarray(0)# Loads parameters from the ROS param server# Parameters are stored in a yaml file inside the config directory# They are loaded at runtime by the launch fileAlpha = rospy.get_param("/turtlebot2/alpha")Epsilon = rospy.get_param("/turtlebot2/epsilon")Gamma = rospy.get_param("/turtlebot2/gamma")epsilon_discount = rospy.get_param("/turtlebot2/epsilon_discount")nepisodes = rospy.get_param("/turtlebot2/nepisodes")nsteps = rospy.get_param("/turtlebot2/nsteps")running_step = rospy.get_param("/turtlebot2/running_step")# Initialises the algorithm that we are going to use for learningqlearn = qlearn.QLearn(actions=range(env.action_space.n),alpha=Alpha, gamma=Gamma, epsilon=Epsilon)initial_epsilon = qlearn.epsilonstart_time = time.time()highest_reward = 0# Starts the main training loop: the one about the episodes to dofor x in range(nepisodes):rospy.logdebug("############### WALL START EPISODE=>" + str(x))cumulated_reward = 0done = Falseif qlearn.epsilon > 0.05:qlearn.epsilon *= epsilon_discount# Initialize the environment and get first state of the robotobservation = env.reset()state = ''.join(map(str, observation))# Show on screen the actual situation of the robot# env.render()# for each episode, we test the robot for nstepsfor i in range(nsteps):rospy.logwarn("############### Start Step=>" + str(i))# Pick an action based on the current stateaction = qlearn.chooseAction(state)rospy.logwarn("Next action is:%d", action)# Execute the action in the environment and get feedbackobservation, reward, done, info = env.step(action)rospy.logwarn(str(observation) + " " + str(reward))cumulated_reward += rewardif highest_reward < cumulated_reward:highest_reward = cumulated_rewardnextState = ''.join(map(str, observation))# Make the algorithm learn based on the resultsrospy.logwarn("# state we were=>" + str(state))rospy.logwarn("# action that we took=>" + str(action))rospy.logwarn("# reward that action gave=>" + str(reward))rospy.logwarn("# episode cumulated_reward=>" +str(cumulated_reward))rospy.logwarn("# State in which we will start next step=>" + str(nextState))qlearn.learn(state, action, reward, nextState)if not (done):rospy.logwarn("NOT DONE")state = nextStateelse:rospy.logwarn("DONE")last_time_steps = numpy.append(last_time_steps, [int(i + 1)])breakrospy.logwarn("############### END Step=>" + str(i))#raw_input("Next Step...PRESS KEY")# rospy.sleep(2.0)m, s = divmod(int(time.time() - start_time), 60)h, m = divmod(m, 60)rospy.logerr(("EP: " + str(x + 1) + " - [alpha: " + str(round(qlearn.alpha, 2)) + " - gamma: " + str(round(qlearn.gamma, 2)) + " - epsilon: " + str(round(qlearn.epsilon, 2)) + "] - Reward: " + str(cumulated_reward) + " Time: %d:%02d:%02d" % (h, m, s)))rospy.loginfo(("\n|" + str(nepisodes) + "|" + str(qlearn.alpha) + "|" + str(qlearn.gamma) + "|" + str(initial_epsilon) + "*" + str(epsilon_discount) + "|" + str(highest_reward) + "| PICTURE |"))l = last_time_steps.tolist()l.sort()# print("Parameters: a="+str)rospy.loginfo("Overall score: {:0.2f}".format(last_time_steps.mean()))rospy.loginfo("Best 100 score: {:0.2f}".format(reduce(lambda x, y: x + y, l[-100:]) / len(l[-100:])))env.close()

創建了一個強化學習訓練循環，并使用 Q-learning 強化學習進行訓練。

每次培訓都有一個關聯的配置文件，其中包含該任務所需的參數，在包中，創建一個名為config的新文件夾，并在其中創建一個名為my_turtlebot2_maze_params.yaml的新文件：

turtlebot2: #namespacetask_and_robot_environment_name: 'MyTurtleBot2Wall-v0'ros_ws_abspath: "/home/user/simulation_ws"running_step: 0.04 # amount of time the control will be executedpos_step: 0.016 # increment in position for each command#qlearn parametersalpha: 0.1gamma: 0.7epsilon: 0.9epsilon_discount: 0.999nepisodes: 500nsteps: 10000running_step: 0.06 # Time for each step

參數分為兩個不同的部分：

環境相關參數：取決于你選擇的RobotEnvironment和TaskEnvironment。

RL算法參數：那些取決于你選擇的RL算法

創建啟動文件夾。在啟動文件夾中，創建一個名為start_training.launch的新文件：

<?xml version="1.0" encoding="UTF-8"?> <launch><rosparam command="load" file="$(find my_turtlebot2_training)/config/turtlebot2_openai_qlearn_params_wall_v2.yaml" /><node pkg="my_turtlebot2_training" name="turtlebot2_maze" type="my_start_qlearning_wall_v2.py" output="screen"/> </launch>

在終端中啟動代碼：

roslaunch my_turtlebot2_training start_training.launch

其中會遇到的問題：

未安裝gym模塊 ImportError: No module named gym

git clone https://github.com/openai/gym cd gym pip install -e .

缺qlearn?

總結

以上是生活随笔為你收集整理的openai_ros教程（ ros gazebo 深度强化学习）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：特洛伊木马程序_特洛伊木马Dridex的
下一篇：摄影测量学和计算机视觉,摄影测量学

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

openai_ros教程（ ros gazebo 深度强化学习）

一、環境搭建

二、用openai_ros創建一個TurtleBot訓練腳本

總結