عنوان پایاننامه
پیش بینی و برنامه ریزی با استفاده از مفاهیم
- رشته تحصیلی
- مهندسی کامپیوتر-هوش مصنوعی- رباتیک
- مقطع تحصیلی
- کارشناسی ارشد
- محل دفاع
- کتابخانه دانشکده برق و کامپیوتر شماره ثبت: E1504;کتابخانه مرکزی -تالار اطلاع رسانی شماره ثبت: 40155;کتابخانه مرکزی پردیس 2 فنی شماره ثبت: E 1504
- تاریخ دفاع
- ۲۰ آذر ۱۳۸۷
- دانشجو
- حبیب کرباسیان
- استاد راهنما
- مجید نیلی احمدآبادی, بابک نجاراعرابی
- چکیده
- در این پژوهش با الهام از یافته های جدید علوم شناختی درباره عصبهای آینه ای و به کار بردن این ایده در ساختار یادگیری تقویتی در کنار یک نوع بازنمایی پیشگویانه به نام شبکه تفاوت زمانی، ساختاری ارائه شده است که به کمک آن یک عامل مصنوعی میتواند بدون دانش اولیه و تنها با استفاده از پاسخی که پیرامونی اش به رفتار او میدهد از محیط خود مفاهیم مجردی را استخراج نماید.
- Abstract
- In this research, we propose two frameworks that should get along with each other. One framework is to provide the ability of abstraction in the form of meaningful concepts for a reinforcement learning agent. The other one is to make use of the first one’s outcomes to allevitate the learning process on the same agent in a new environment. For the first phase, we propose an approach whereby a reinforcement learning (RL) agent attempts to understand its environment via meaningful temporally extended concepts in an unsupervised way. Our approach is inspired by findings in neuroscience on the role of mirror neurons in action-based abstraction with a newly proposed temporal-difference network. To direct the agent to gather fertile information for concept learning, a reinforcement learning mechanism utilizing experience of the agent is proposed. After this phase, the agent is ready to model its surrounding via the new TDN. The final step is to extract those temporally extended concepts which yield better expected rewards in terms of Q values. In the second phase, we make use of the knowledge gathered throughout the first phase to learn the same task in a new environment. More specifically, in this phase two different action-selection algorithms have been proposed based upon the new TDN. The first algorthim is solely dependent on TDN but the second one is not only to utilize TDN but also to take advantage of previously extracted concepts from the first framework as a prior knowledge. These two algorithms have shown better results compared to those of well-known action-selection algorithms in terms of average of expected rewards. Simulations results demonstrate the capability of the proposed approaches in retrieving meaningful concepts from environments and making use of them to hasten the process of learning in a new arena.