Firms learning price in a monopolistic competition

0 downloads 0 Views 166KB Size Report
May 5, 2006 - In the model proposed in this paper, inspired by Dixit and Stiglitz [1], house- ... Differentiated goods and monopolistic competition among ... The composite consumption good consists of differentiated products produced.
Firms learning price in a monopolistic competition economic model Silvano Cincotti, 1 Eric Guerci, Marco Raberto, Andrea Teglio DIBE-CINEF, Universit` a di Genova, Via Opera Pia 11a, 16145 Genova, Italy

Abstract This paper aims to investigate the possibility for monopolistically competitive firms to set the price and the quantity of goods through a reinforcement learning approach. The model incorporates a set of firms that produce and sell differentiated goods in monopolistically competitive markets and households that supply different types of labor, purchase goods for consumption and hold money. The households behave optimally, maximizing their utility, while the firms learn their desired mark-up by a reinforcement learning algorithm in order to set the price. Key words: Monopolistic competition, Reinforcement learning

Introduction In the model proposed in this paper, inspired by Dixit and Stiglitz [1], households decide their goods demand and their labor supply by optimizing an utility function, while firms, facing household’s demand, learn how to set prices by means of a reinforcement learning algorithm. Attempts to investigate how software agents may use reinforcement learning algorithms, such as Q-learning, to make economic decisions are present in the literature, (see [2] - [3] for an example). Typically the problem is related to the setting of prices in a competitive marketplace. A population of agents, each trying to adapt in the presence of other adaptive agents, often gives rise to non-stationary and history dependent problems that are not theoretically guaranteed to converge to any global optimal solution. In order for reinforcement learning methods to be more generally useful in solving such problems, they need to be extended to handle these non-Markovian properties. The contribution of this paper is to incorporate a reinforcement learning approach in a monopolistic competition 1

Corresponding author. E-mail: [email protected]

CINEF Working Paper submitted to ECCS2006

5 May 2006

conceptual framework (see [4]), that already integrates price taking decisions by economic agents. Differentiated goods and monopolistic competition among the suppliers are considered, as in the New Keynesian framework (see [5] - [6]). A similar problem has been approached by [7] - [9] in a context of adaptive least square learning. The criterion by which this paper judges convergence of learning dynamics to the optimal solution is the comparison of the prices learned by firms with the prices obtained by finding the equilibrium solutions of the economic system.

1

The model

1.1 Households In the following section, a model for n households is proposed. The complete households decision problem is described and the analytic solution is derived. Each i-th household provides at each time step t a labour supply Li,t to the firms and demands consumption goods Ci,t , produced by the firm itself, according to an utility maximizing behavior. The utility function is characterized by separability between labor, consumption and money. The positive utility is a Cobb-Douglas function of consumption and money holdings while there is a disutility on labor. Overall consumption is obtained aggregating single goods consumption by a Dixit-Stiglitz function, as usual in a monopolistic competition framework. The optimization problem is therefore the following: Ã

max Ui,t =

Cij ,Mi ,Li

γ Ci,t

Mi,t Pt

!1−γ

− µLβi,t ,

(1)

The composite consumption good consists of differentiated products produced by monopolistically competitive firms. Ci,t and Pt correspond to the index of the i-th household’s consumption and the index of goods price, respectively. The quantity of labor L of type j is supplied in each time period. MPi,t is the t real money balance of the i-th household. Li,t is the labor supply, while Wt is the nominal wage that is the same for all workers, hypostatizing perfect competition on the labor market. Ci,t is obtained by Ã

Ci,t = m

1 1−θ

X

θ−1 θ

Cij,t

!

θ 1−θ

,

(2)

j

where θ is the elasticity of substitution among goods. To guarantee the existence of an equilibrium, θ is restricted to be greater than unity. Pt is the 2

corresponding price index, i.e., Ã

Pt =

1 X 1−θ P m j j,t

!

1 1−θ

,

(3)

The household is subject to a budget constraint given by ∆Mi,t = Wt Li,t +



´

Dij,t + Pj,t Cij,t ,

(4)

j

where Mi,t denotes the household’s nominal holdings of money. Thus, constraint imposes that the total expenditures in consumption goods plus the money holdings must not exceed the sum of income and dividends D from firms. The households face the problem of optimizing the allocation across differentiated goods at each time period t, given the overall level of expenditures. Given the utility function and the budget constraint, households decide their consumption, the amount of labor to supply, and the quantity of money to hold. The first order conditions for consumers are Ri,t Pi,t à ! Pj,t γ Ri,t = Pt m P Ci,t = γ

Cij,t

Ã

!

(5)

1

Wt β−1 Li,t = β Pt Mi,t = (1 − γ)Ri,t 1 1−β

with Ri,t defined as Ri,t = Wt Li,t +

X

Dij,t + Mi,t−1 ,

(6)

j

It is worth noting that the elasticity of good’s j demand with respect to its relative price is θ, and the elasticity of labor supply with respect to real wage 1 . is β−1 1.2 Firms The model is populated by N monopolistically competitive firms, that produce N goods, differing one from each other with respect to the elasticity of 3

substitution θ. The limit case, for high values of θ, corresponds to perfect substitutability and therefore, perfect competition among firms. Each good j is produced by firm j according to a production function of the form: Yj,t = Lαj,t ,

(7)

For the sake of simplicity labor is the only factor of production and the technological endowment is the same for each firm. Nominal profits for firm j are Πj,t = (Pj,t Yj,t − Wt Lj,t ) ,

(8)

Each firm seeks to optimize its stream of expected real profits, i.e., max Pj ,Yj

X

Ã

δ

t

t

!

Πj,t , Pt

(9)

where, for δ = 0 we are in the case of instantaneous utility optimization. Being the goods demand curves identical, and being the same production technology shared by all firms, there is complete symmetry among firms. Therefore, in case of general economic equilibrium good’s prices decided by firms shall be equal one to each other, and equal to the general level of prices, i,e., PPj = 1 for any j Imposing this constraint and solving the optimization problem of Eq. 9 for the firms, with δ = 0, one obtains the equilibrium relation: Ã

W P

!∗

Ã

=

m nk

! (β−1)(1−α) Ã β−α

θ−1 α θ

!

1 β−α

,

(10)

Given this solution as a benchmark, the aim of our research is to solve the general inter-temporal optimization problem represented by Eq. 9 in a multiagent scenario, i.e. n firms monopolistically competing.

2

Preliminary results and discussion

In this paper, it is presented a computational approach based on reinforcement learning. Generally speaking, reinforcement learning techniques, such as Q-learning algorithm, can be successfully applied to solve inter-temporal optimization problem where the system evolves as a Markow Decision Process (MDP). 4

30

P (price level)

25

20

15

10

5

0

1000

2000

3000

4000

5000

3000

4000

5000

t (time)

0.8 Overall production

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

1000

2000 t (time)

Fig. 1. Dynamics of the price level and the overall production set by learning firms. Straight lines represent the analytical equilibrium values.

It is worth noting that, under the framework of MDP, theorem of convergence to the optimal solution for the Q-learning are proved. Conversely, the proposed multi-agent framework does not correspond to a MDP. From the point of view of a specific seller, the system does not evolve according to a markovian transition probability matrix. However, it is worth remarking that analytical results for a multi-agent intertemporal optimization problem are not achievable. We propose to tackle such difficult task by means of a genuine computational approach. We implement firms individually learning the best pricing strategies by means of a Q-learning algorithm. The Q-learning algorithm permits to recursively learn the discounted sum of the instantaneous utilities, interacting directly with the economic environment. According to our model, firms should be able to learn the appropriate price and, consequently, the quantity to produce. The notable feature of this problem setting is the presence of an analytical framework that has been developed in order to control the learning results. The figure presents results of a preliminary computational experiment performed considering three firms. The price level and the overall production of the economy tends to converge towards their analytical equilibrium levels in the long-run. 5

Acknowledgement This work has been partially supported by the University of Genoa, by the Italian Ministry of Education, University and Research (MIUR) under grants FIRB 2001 and COFIN 2004 and by the European Union under IST-FET ”Complex Systems” STREP Project EURACE.

References [1] A. K. Dixit, J. E. Stiglitz, Monopolistic Competition and Optimum Product Diversity, American Economic Review, June 1977, 297-308. [2] G. Tesauro, J. O. Kephart, Pricing in Agent Economies Using Multi-Agent Q-Learning, Autonomous Agents and Multi-Agent Systems, 2002, 5, 289-304. [3] E. Kutschinski, T. Uthmann, D. Polani, Learning competitive pricing strategies by multi-agent reinforcement learning, Journal of Economic Dynamics & Control, 2003, 27, 2207-18. [4] M. Woodford, Interest & Prices foundations of a theory of monetary policy, Princeton University Press, 2003. [5] L. E. O. Svensson, Sticky Goods Prices, Flexible Asset Prices, Monopolistic Competition, and Monetary Policy, 1986, LIII, 385-405. [6] O. J. Blanchard, N. Kiyotaki, Monopolistic Competition and the Effects of Aggregate Demand, American Economic Review, Sept. 1987, 647-66, [7] J. Bullard, K. Mitra, Learning about monetary policy rules, Journal of Monetary Economics, 2002, 49, 1105-29. [8] G. W. Evans, S. Honkapohja, adaptive Learning and Monetary Policy Design, Journal of Money Credit, and Banking, Dec. 2003, part 2, 1045-72. [9] B. Preston, Learning about Monetary Policy Rules when Long-Horizon Expectations Matter, International Journal of Central Banking, Sept. 2005, 81-126.

6