On Applying Q-Learning to Optimize Power Allocation in 2-users NOMA System การประยุกต์ใช้งาน Q-Learning เพื่อการจัดสรรกำลังที่เหมาะสมในระบบโนมาที่มีผู้ใช้ 2 ราย

On Applying Q-Learning to Optimize Power Allocation in 2-users NOMA System
การประยุกต์ใช้งาน Q-Learning เพื่อการจัดสรรกำลังที่เหมาะสมในระบบโนมาที่มีผู้ใช้ 2 ราย

Phetnakorn Aermsa-Ard, Chonticha Wangsamad, Kritsada Mamat

Abstract

บทความนี้พิจารณาการเข้าถึงหลายส่วนแบบไม่ตั้งฉาก (Non-Orthogonal Multiple Access: NOMA) หรือ โนมา ซึ่งเป็นวิธีการถึงช่องสัญญาณของผู้ใช้ในระบบสื่อสารไร้สายยุคที่ 5 และหลังจากนั้นโดยวิธี Successive Interference Cancellation (SIC) ถูกนำมาประยุกต์ใช้เพื่อตรวจจับข้อมูลของผู้ใช้แต่ละรายในโดเมนกำลังและการจัดสรรกำลังส่งมีผลกระทบต่อสมรรถนะของระบบ บทความนี้นำเสนอการประยุกต์ใช้วิธี Q-Learning ซึ่งเป็นวิธีหนึ่งของการเรียนรู้ของเครื่องเพื่อแก้ปัญหาการจัดสรรกำลังที่เหมาะสมในระบบโนมาที่มีผู้ใช้งาน 2 รายโดยมีวัตถุประสงค์เพื่อให้ได้อัตราบิตต่ำสุดสูงที่สุดโดยนำเสนอการแปลงส่วนต่าง ๆ ของระบบโนมาไปเป็นองค์ประกอบของวิธี Q-Learning ได้แก่ เอเจนท์ แอคชัน สเตจ รางวัล และสภาพแวดล้อมซึ่งมีความสำคัญต่อกระบวนการเรียนรู้ ผลการจำลองระบบแสดงให้เห็นการจัดสรรกำลังด้วยวิธี Q-Learning มีการเรียนรู้เพื่อเพิ่มรางวัลในแต่ละเสตจ ในส่วนของสมรรถนะของระบบนั้นวิธี Q-Learning ให้อัตราบิตของผู้ใช้ทั้งสองรายใกล้เคียงกันและยังใช้ต่ำสุดที่สูงกว่าวิธีการจัดสรรกำลังที่มีอยู่ก่อนหน้าและเครื่องมือในไลบรารี่ของภาษา Python

This article considers a power domain non-orthogonal multiple access (NOMA) system which is a multiple access technique considered to be used in the 5G technology and beyond. Successive interference cancellation (SIC) is applied to decode user’s signals and power allocation significantly affects the system performance. In this article, we propose to apply Q-learning which is one of the machine learning methods to solve a transmit power allocation problem in a 2-users NOMA system where the objective function is to maximize the minimum transmission rate. We show how to transform NOMA system into O-Learning components namely agent, action, stage, reward, and environment which are very important for the learning process. Numerical results show that the Q-learning offers higher reward in each step. For the system performance, the bit rates of two users in the system are very close to each other when the Q-learning is applied. Furthermore, the Q-learning offers a higher minimum rate than that performed by dynamic power allocation methods in the literature and optimizers in Python’s library.

Keywords

โนมา; การจัดสรรกำลัง; Q-Learning; NOMA; power allocation; Q-Learning

References

[1] L. Dai, B. Wang, Z. Ding, Z. Wang, S. Chen, and L. Hanzo, A survey of non-orthogonal multiple access for 5G, IEEE Communication Surveys Tutorials, 2018, 20(3), 2294–2323.

[2] M. Zeng, A. Yadav, O.A. Dobre, G.I. Tsiropoulos, and H.V. Poor, On the sum rate of MIMO-NOMA and MIMO-OMA systems, IEEE Wireless Communication Letter, 2017, 6(4), 534–537.

[3] B. Makki, K. Chitti, A. Behravan and M.-S. Alouini, A survey of NOMA: Current status and open research challenges, IEEE Open Journal of the Communications Society, 2020 1, 179-189.

[4] M.M. El-Sayed, A.S. Ibrahim and M.M. Khairy, Power allocation strategies for non-orthogonal multiple access, International Conference on Selected Topics in Mobile & Wireless Networking (MoWNeT-Egytp 2016), Proceeding, 2016, 1-6.

[5] M.A.M. Kaaffah and I. Iskandar, Power allocation effect on capacity of single carrier power domain non-orthogonal multiple access (NOMA), 7th International Conference on Wireless and Telematics (ICWT-Indonesia 2021), Proceeding, 2021, 1-5.

[6] R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction, 2nd Ed., MIT Press, Cambridge, Massachusetts, London, 2017.

[7] M. Chen, W. Saad and C. Yin, Optimized uplink-downlink decoupling in LTE-U networks: An echo state approach, IEEE International Conference on Communications (ICC-Malaysia 2016), Proceeding, 2016, 1-6.

[8] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu and N. D. Sidiropoulos, Learning to optimize: training deep Neural networks for interference management, IEEE Transactions on Signal Processing, 2018, 6(20), 5438-5453.

[9] E. Mete and T. Girici, Q-Learning based scheduling with successive interference cancellation, IEEE Access, 2020, 8, 172034-172042.

[10] https://www.sciencedirect.com/topics/engineering/non-orthogonal-multiple-access. (Accessed on 18 July 2022)

[11] https://medium.com/@nutorbitx/reinforcement-learning. (Accessed on 18 July 2022)

Full Text: PDF

DOI: 10.14416/j.ind.tech.2023.04.004

Refbacks

There are currently no refbacks.

Username
Password
Remember me