This paper studies the problem of integrated lot-sizing and maintenance decision making in case multiple products stochastic demand. The is formulated as a Markov process, which goal to find joint production policy that minimizes long run expected total discounted cost. Therefore, classic Q-learning algorithm adopted, decomposition-based approximate Q-value heuristic developed obtain near-optim...