Zou Chuanwei: Preliminary Research on the Characteristics, Value and Allocation Mechanism of Data Elements

特邀专栏作者

2020-05-06 06:00

This article is about 13898 words, reading the full article takes about 20 minutes

The definition of data property rights is the basis for the effective allocation of data elements, which can be implemented through cryptography, blockchain and system design.

Editor's Note: This article comes fromPlatON（ID：PlatON_network）secondary title

foreword

The Central Committee of the Communist Party of China and the State Council's "Opinions on Constructing a More Complete System and Mechanism for the Market-oriented Allocation of Elements" listed data as one of the elements for the first time. This article was written by Dr. Zou Chuanwei, Chief Economist of Wanxiang Blockchain and PlatON. The main conclusion is that data is a complex concept with many types and rich characteristics. The understanding of data is inseparable from the analysis of related concepts such as information and knowledge, which can be carried out under the framework of the DIKW model. Extracting information, knowledge and wisdom from data can help individuals make better decisions and improve the effect, and promote economic growth on a macro level, which is the embodiment of the value of data.

However, many data are public goods or quasi-public goods, and the value of data lacks objective measurement standards, so there are multiple allocation mechanisms for data elements. Market-oriented allocation is not equal to market transaction mode. The definition of data property rights is the basis for the effective allocation of data elements, which can be implemented through cryptography, blockchain and system design. For personal data, control and privacy are more important than ownership.

On April 9, 2020, the Central Committee of the Communist Party of China and the State Council issued the "Opinions on Building a More Complete System and Mechanism for the Market-oriented Allocation of Factors", which for the first time listed data and traditional elements such as land, labor, capital, and technology as one of the elements, and put forward the requirements Accelerate the cultivation of the data element market, including promoting the opening and sharing of government data, enhancing the value of social data resources, and strengthening the integration and security protection of data resources.

Data as an element is a new proposition, and there are a large number of frontier issues to be studied. In the literature, related issues belong to the category of Data Economy. Data economy refers to the economic ecology composed of activities such as data collection, organization, use, sharing, circulation and management.

This paper conducts a preliminary discussion on the following three questions: First, what are the important technical and economic characteristics of data elements? Second, the connotation and measurement method of data value; third, the allocation mechanism of data elements.

secondary title

1. Technical and economic characteristics of data elements

What is data? Contrary to what is commonly believed, this is a fundamental yet complex question in information science, with no obvious answer. The understanding of data is inseparable from the analysis of related concepts such as information and knowledge. Ackoff (1989) proposed the DIKW model (Figure 1), D refers to data (Data), I refers to information (Information), K refers to knowledge (Knowledge), W refers to wisdom (Wisdom). The DIKW model is widely used in the fields of information management, information systems, and knowledge management, and different researchers give different explanations from different angles. Rowley (2007) made a review. This article does not discuss the DIKW model in depth, but only sorts out the technical characteristics of the data that are most relevant to economic analysis on the basis of Rowley (2007).

image description

Figure 1: DIKW model

First, there is an affiliation relationship from narrow-caliber to wide-caliber among wisdom, knowledge, information and data. Information can be extracted from data, knowledge can be summarized from information, and wisdom can be sublimated from knowledge. These extractions, summaries, and sublimations are not simple mechanical processes, relying on different methodologies and additional inputs (such as application scenarios and background knowledge of related disciplines). Therefore, although information, knowledge and wisdom also belong to the category of data, they are "higher-order" data.

Second, data is a product of observation. Observation objects include objects, individuals, institutions, events, and their environments. Observation is based on a range of perspectives, methods and tools, accompanied by corresponding symbolic representation systems, such as units of measurement. Data is the product of recording the characteristics and behavior of observed objects using these symbolic representation systems. Data can take the form of text, numbers, graphs, sound and video. In the form of existence, data can be digital (Digital) or non-digital (such as recorded on paper). However, with the development of information and communication technology (ICT), more and more data is digitized and represented as binary at the bottom layer.

Third, the data is processed through cognitive processes to obtain information, giving answers to questions about who (Who), what (What), where (Where) and when (When). Information is organized and structured data that is relevant to specific goals and situations and therefore has value and meaning. For example, according to information theory, information can reduce uncertainty measured by entropy.

Fourth, knowledge and wisdom are more difficult to define accurately than data and information. Knowledge is the application of data and information to give answers about how to do it (How). Wisdom has a distinct meaning of value judgment, and it is related to the prediction of the future and value orientation on many occasions.

Next, we use econometrics as an example to illustrate the DIKW model. Econometrics is the main method of empirical analysis in economics. Empirical analysis is based on observations, answering the "what?" question. In econometrics, the object of observation is usually called a sample, which can be an individual, an institution, a region, or even a country. Observing samples from different angles corresponds to the variable concept in econometrics. Observe a group of samples from different angles to obtain cross-sectional data, and observe continuously from the same angle at different time points to obtain time-series data, and the synthesis of cross-sectional data and time-series data is panel data. These types of data are structured data. With the digitization of more and more data and the development of artificial intelligence and big data analysis methods, semi-structured data and unstructured data have more and more applications in economics, such as Internet browsing, click and other data.

Econometrics extracts information from data, mainly including: first, discovering the laws and patterns hidden in the data; second, estimating models; third, testing hypotheses. This corresponds to the information hierarchy of the DIKW model. For example, doing descriptive statistics on data, calculating the mean, standard deviation, and correlation coefficient between variables is one of the easiest ways to extract information from data. Econometrics often assumes that data follows a data generation process (Data Generation Process), but the model form and parameter values of the data generation process are unknown, and random interference will bring errors to observations. Econometrics uses observed data, estimates the data-generating process, and tests hypotheses based on this. Artificial intelligence and big data analysis methods are more flexible in data processing, and can be divided into predictive analysis and descriptive analysis. Predictive analysis is based on the value of some variables to predict the value of other variables. Descriptive analytics is the process of deriving and summarizing patterns of potential connections in data, including correlations, trends, clusters, trajectories, and anomalies. The two types of analysis are embodied in specific methods such as classification, regression, association analysis, cluster analysis, recommendation system and anomaly detection.

Policy recommendations are put forward based on the results of econometric analysis, corresponding to the knowledge level of the DIKW model. Much policy research is normative analysis, answering the question of what should be. Insights from economics on economic equilibrium, economic growth, macro-control, price mechanism, micro-incentives, and risk pricing correspond to the wisdom level of the DIKW model.

Generally speaking, the technical characteristics of data mainly include the following dimensions:
Sample distribution of data, time coverage and variables/attributes/fields etc.
Data capacity, such as the number of samples, the number of variables, the length of the time series, and the occupied storage space.
Data quality, such as whether the sample is representative, whether the data conform to pre-defined norms and standards, observation granularity, precision and error, and data completeness (such as whether there are missing data).
Timeliness of data. Given that the characteristics and behavior of the observed subject can change over time, is the data still reflective of the observed subject?
Data Sources. Some data come from first-hand observation, some data are provided by first-hand observers, and some data are derived from other data. Data can come from controlled experiments and sample surveys, or from the Internet, social networks, Internet of Things, and Industrial Internet, among others. Data can be generated by humans or by machines. Data can come from online or offline.
The type of data, including whether it is digital or non-digital, structured or unstructured, and the form it exists in (text, numbers, graphics, sound, video, etc.).
Interoperability and linkability between different data sets, such as whether the sample ID is unified, whether the variable definition is consistent, and whether the data unit is consistent, etc.

whether it is personal data. Personal data has many particularities in terms of privacy protection, which need to be discussed specifically.

Compared with the technical characteristics of data, the economic characteristics of data are much more complicated. Data can generate value (see below), so it has asset properties. Data has the characteristics of both goods and services. On the one hand, data can be stored and transferred, similar to commodities. Data can accumulate without physically diminishing or corrupting. On the other hand, a lot of data is intangible, similar to services. Data as an asset has many particularities, which can be analyzed from the perspective of Table 1:

image description

Table 1: Classification of public goods, quasi-public goods and private goods

Non-rivalry means that when one person consumes a product, it does not reduce or limit other people's consumption of that product. In other words, the marginal cost of each additional consumer of the product is equal to zero. Most data can be reused without reducing data quality or capacity, and can be used by different people at the same time, so it is non-rivalrous.

Non-excludability means that when someone pays to consume a product, other people who do not pay cannot be excluded from consuming the product, or the cost of exclusion is high. Many data are non-exclusive, such as weather forecast data. But through technology and institutional design, some types of data are exclusive. For example, some media information terminals are paid, and only paid members can read them.

According to Table 1, many data are public goods and can be freely used, transformed and shared by anyone for any purpose. For example, economic statistics and weather forecast data released by the government. Some data are club products and are quasi-public products, such as the paid media information terminals mentioned above. Most data is non-rivalrous, so less data is private goods and public resources.

Ownership of data is a complex issue both legally and practically, especially for personal data. Data is easy to be collected, stored, copied, disseminated, collected and processed without reasonable authorization, and data collection and processing are accompanied by the generation of new data. This makes it difficult to clearly define the ownership of data, and it is also difficult to be effectively protected. For example, in the Internet economy, Internet platforms record users' clicks, browsing and shopping histories, etc., which are very valuable data. Although these data describe the user's characteristics and behavior, they are not provided by the user like the user's personally identifiable information, and it is difficult to say that it is owned by the user. Although Internet platforms record and store these data, these data are closely related to the privacy and interests of users. It is difficult for Internet platforms to use and dispose of these data without users' knowledge, so Internet platforms do not have complete property rights.

Many articles compare data to the oil of the new economy. This metaphor is actually not accurate. Oil is competitive and exclusive, and its property rights can be clearly defined. As a private product, complex market trading models such as spot and futures have formed. It is difficult to clearly define the ownership of a lot of data, and it is difficult to effectively participate in market transactions as public goods or quasi-public goods. Therefore, it is more appropriate to compare data to sunlight.

secondary title

2. The connotation and measurement of data value

(1) Connotation of data value

According to the DIKW model, information, knowledge and wisdom are extracted from data, which implies the concept of data value chain. Raw data is processed and integrated with other data, analyzed to generate actionable insights, and ultimately action to generate value.

The value of data can be understood from both micro and macro levels. At the micro level, information, knowledge and wisdom can not only satisfy the curiosity of users (that is, as the final product), but also improve the user's cognition and help them make better decisions (that is, as an intermediate product). is to increase their utility. The improvement of the utility of data to users reflects the value of data. At the macro level, information, knowledge, and wisdom help to increase total factor productivity and play a multiplier role, which is also a reflection of the value of data. This article mainly discusses the value of data at the micro level, with the following key features.

1. The same data can be of very different value to different people

First, different people have different analysis methods, and the information, knowledge and wisdom extracted from the same data can vary greatly. For example, in the history of science, many scientists have delved into phenomena that the public takes for granted and made important discoveries. What the weight fell to Newton, the lightning to Franklin, and the blue of the sea to Raman are completely different from their value to the public. For another example, in economics, different economists often make completely different interpretations of the same economic data.

Second, different people are in different scenarios and face different problems, and the same data has different effects on them. The same data, which may be garbage to some people, may be treasure to others. For example, archaeological discoveries are of great value to historical researchers, but are likely to be of little value to financial investors. For example, alternative data (Alternative Data) includes personal data, business process data, and sensor data. These data can help investors make investment decisions, but are of little value to non-financial investors. Different people can use data in different time dimensions, such as evaluating the past, analyzing the present, predicting the future, and doing retrospective testing. The purpose of use is different, the requirements for data are different, and the same data means different values.

Third, different systems and policy frameworks have different restrictions on the use of data, which will also affect the value of data. In other words, the value of data is endogenous to systems and policies. For example, different countries have different levels of protection for personal data, and the collection and use of personal data and the value generated vary greatly among countries. my country's top-ranked Internet platforms have launched online credit products based on user behavior data, which is not common in other countries. After the Internet platform obtains user data, if it does not properly protect and use it, and does not respect user privacy, it will affect its brand image and user trust, and will have a negative impact on data value and company value. In April 2020, the U.S. federal court approved the $5 billion settlement agreement between Facebook and the U.S. Federal Trade Commission over the Cambridge Analytica scandal.

2. The value of data changes over time

First, data timeliness. After a period of time, many data will decline in value because they cannot reflect the current situation of the observed object well. This phenomenon is called data depreciation. Data depreciation is evident in financial markets. For example, a new news can have a great impact on the price of a security when it is first released, but after the price of the security reflects the news, its value to a financial investment drops sharply to zero. In the DIKW model, data is refined into information, knowledge, and wisdom, and the higher the level of refinement, the more resistant it is to data depreciation.

Second, data has option value. New opportunities and technologies will bring new value to existing data. In many cases, data is collected not only for immediate needs but also for future welfare.

3. Data creates externalities

First, the value of data to individuals is called private value, and the value of data to society is called public value. Data that is non-excludable or non-rivalrous creates externalities and creates a discrepancy between private and public value. This externality can be positive or negative, and there is no conclusion.

Second, the value of the combination of data and data can be different from the sum of their respective values, which is another externality. But the jury is also out on whether data aggregation adds value. On the one hand, there may be increasing returns to scale, for example, more data can better reveal hidden laws and trends. On the other hand, there may be situations of diminishing returns to scale, where more data introduces more noise. But in general, the larger the data capacity, the higher the data value is not necessarily, and the data content is also very important. For example, for 1 hour of video surveillance data, the valuable data may only be 1-2 seconds.

(2) Measurement of data value

1. Absolute Valuation

In view of the three key characteristics of data value, the absolute valuation of data is relatively difficult, and there is no generally accepted method. There are several main approaches in current industry practice, but all of them are flawed (BIPP, 2020; Deloitte and Ali Research Institute, 2019).

The first is the cost method, that is, the cost of collecting, storing and analyzing data is used as the data valuation benchmark. These costs range from software and hardware to intellectual property and human resources, as well as contingent costs resulting from security incidents, loss of sensitive information or loss of reputation. Data collection and analysis generally have the characteristics of high fixed costs and low marginal costs, thus having economies of scale. Although the cost method is easy to implement, it is difficult to consider the difference in value of the same data to different people, at different points in time, and when combined with other data. In addition, Deloitte and Ali Research Institute (2019) pointed out that some data are additional products of enterprise production and operation, and acquisition costs are usually difficult to separate from business and difficult to measure reliably. Obviously, the value of data is not necessarily higher than the cost, which means that not all data is worth collecting, storing and analyzing.

The second is the income approach, which is to assess the social and economic impact of the data, predict the resulting future cash flows, and then discount the future cash flows to the present. The income method is logically similar to the discounted cash flow method in company valuation. It can consider the three key characteristics of data value. It is relatively perfect in theory, but it faces many obstacles in implementation. One is the difficulty of modeling the social and economic impact of data. The second is how to evaluate the option value of data. Real option valuation is an alternative, but not perfect.

The third is the market method, which is to use the market price of data as a benchmark to evaluate the value of data that is not in the market. The market method is similar to the price-earnings ratio and price-to-book ratio valuation method of the stock market. The disadvantage of market law is that many data are non-exclusive or non-competitive, making it difficult to participate in market transactions. At present, there are some attempts in the data element market, but the market thickness and liquidity are not enough, and the price discovery function is not perfect. In addition, the merger and acquisition prices of some companies include the valuation of data, but it is not easy to separate them.

Fourth, the questionnaire test method. This method is mainly aimed at personal data, and uses questionnaires to test how much an individual is willing to charge to sell their own data, or how much they are willing to spend to protect their own data, so as to evaluate the value of personal data. The application of this method is very narrow, and the implementation cost is relatively high.

2. Relative Valuation

The data-relative valuation goal is, given a set of data and a common task, to evaluate the contribution of each set of data to the completion of the task. Compared with absolute valuation, relative valuation is simpler, especially for quantitative data analysis tasks.

In the relative valuation of data, common data grouping methods include: first, the variables/fields are the same, but belong to different observation samples; second, the same observation samples, but the variables/fields are different. For common predictive and descriptive tasks, statistics and data science have established quantitative evaluation indicators. For example, for prediction tasks, out-of-sample tests are required to evaluate prediction errors. When the predictor variable is discrete, indicators such as accuracy rate, error rate, and the area under the Receiver Operating Characteristic (ROC) curve are commonly used. The standard error is commonly used when the predictor variable is continuous. For description tasks, sample data is required to evaluate the model fitting effect. Linear models generally use R-squared, and nonlinear models generally use likelihood functions (need to make assumptions about the distribution of interference items).

The relative valuation of data shows that the value of the same data is different when it is used for different tasks, using different analysis methods, or combined with different data. In particular, data that deviates from the "mainstream" of the data set may have a higher relative valuation than data that is close to the "mainstream" of the data set, which shows the value of "outliers".

secondary title

3. Configuration mechanism of data elements

In reality, there are many types of data and different characteristics, correspondingly resulting in different configuration mechanisms. Because a lot of data is not suitable for participating in market transactions, and many allocation mechanisms do not belong to the market transaction mode. In other words, market-oriented allocation is not equal to market transaction mode.

These mechanisms all address two salient issues in data element configuration. First, information asymmetry. The data element configuration mechanism involves multiple parties with inconsistent interests. For example, data subjects often do not know when, for what purpose or what consequences their data will be collected. Data producers do not know whether data subjects selectively disclose data, and whether they will adjust their behavior in a targeted manner when they know their data has been collected, nor do they know the value of the produced data to different data users. It is difficult for data users to fully understand the value of data to themselves beforehand. For example, relative valuation of data is done after the fact.

Second, the incomplete contract. All data element configuration mechanisms can be expressed as a combination of a series of contracts. However, there are rich scenarios for data applications, multiple links in the data value chain, and the lack of objective measurement standards for data value. These factors make it difficult for the data element configuration mechanism to cover all possible situations that may arise afterwards. This will not only affect the incentives for data subjects to share data and data producers to produce data, but also affect the reasonable distribution of data value among different contributors in the data value chain.

Next, according to the economic characteristics of data and application scenarios, discuss representative data element allocation mechanisms.

(1) Data as public goods

When data is used as a public product, there will be problems of insufficient investment and insufficient supply provided by the private sector, and it is generally provided by government departments using tax revenue. Data openness and sharing projects of government departments can be understood under this framework. Government departments should open government data to the society and the market as much as possible on the premise of not involving confidentiality, so as to maximize the public value of government data.

In 2009, the U.S. federal government launched Data.gov, an open data portal, to provide a unified hosting platform for data previously scattered on the websites of different federal government agencies. In 2019, the U.S. "Open Government Data Act" requires that, except for data related to national security and other special reasons, the federal government should publish the data it owns online, and these public data should be in a standardized, machine-readable form.

Since 2016, my country has promulgated a series of documents such as the "Interim Measures for the Management of Government Information Resources Sharing" and the "Work Plan for the Opening of Public Information Resources", starting the process of sharing and opening up government affairs data. The first direction of work proposed in the "Opinions on Building a More Complete System and Mechanism for the Market-based Allocation of Factors" is to promote the opening and sharing of government data.

(2) Data as Quasi-Public Goods

If data as a quasi-public product has clear ownership and exclusiveness, there are three main allocation mechanisms as follows.

First, as the data of club products, a paid subscription model can be adopted, such as paid media information terminals.

Second, the open banking model. Banks open user data to authorized third-party organizations through Application Programming Interface (API) to facilitate the development and use of user data. Banks not only limit which user data can be opened, but also limit which institutions it can open to. This is actually a partial implementation of user data portability.

Third, the data trust model. According to BIPP (2020), data trusts can take different forms, such as legal trusts, deeds, companies, and public and community trusts. The main objectives of the data trust include: first, to enable data to be shared; second, to promote the public interest and the private interests of data sharers; third, to respect the interests of those who have legal rights to the data; fourth, to ensure that the data is ethically and ethically Data is shared by way of trust rules.

As mentioned above, in the Internet economy, if personal data is not provided by users, but comes from the observation and recording of user characteristics and behaviors by Internet platforms, then ownership is difficult to define clearly. In reality, Internet platforms often provide users with free information and social services, with the goal of expanding the number of users and gaining users' attention and personal data (such as user preferences, consumption characteristics, and social connections, etc.). In this mode, it can be considered that users exchange their attention and personal data for information and social services, so it is called PIK mode (Figure 2). On the one hand, the Internet platform monetizes user traffic through advertising revenue, and on the other hand, it conducts precise marketing and develops credit products based on user personal data.

image description

Figure 2: PIK model of the Internet platform

The PIK model has three main disadvantages: First, the status of the Internet platform and users is unequal, and it is easy to collect user data without user authorization, or excessively collect user data, or collect personal data from A’s business. It is used for B's business, thus causing privacy violation and data misuse. Second, if the Internet platform forms a captive ecosystem, it will lock users and actually control user data. It is difficult for users to open or migrate their data to competitors on Internet platforms. Internet platforms constitute unfair competition against competitors through data-opoly. Third, it is difficult to guarantee that users will receive reasonable remuneration after providing personal data. For example, are users revealing important personal information for less valuable information? The unequal status between Internet platforms and users, and the absence of a market pricing mechanism in the PIK model make it difficult to effectively protect users' rights and interests.

In the PIK mode, the data controller (Internet platform) is in a dominant position over the data subject (user), and the data controller is often also a data user, while the data subject lacks control over their own data, and there are many ambiguities in data property rights place. How to correct the disadvantages of the PIK model is a core issue in personal data management.

(4) Data element market

Because many data are non-exclusive or non-competitive, participation in market transactions is restricted. On the other hand, the externalities caused by non-exclusivity or non-competition make the difference between the private value and public value of data, and market transactions may not necessarily realize the maximum social value of data.

In reality, due to the diversity of data types and characteristics, and the lack of objective measurement standards for data value, there is currently no centralized and liquid data element market. But peer-to-peer trading of data (similar to over-the-counter trading) has been happening, such as alternative data markets. There are a large number of alternative data providers in this market. From shallow to deep, they can be roughly divided into raw data providers, lightly processed data providers and signal providers. This market has developed consulting intermediaries, data aggregators and technical support intermediaries, etc., as a bridge between data buyers (mainly investment funds) and data providers. Among them, consulting intermediaries provide buyers with consulting on alternative data purchase, processing and related legal matters, as well as information on data suppliers. Data aggregators provide integrated services, and buyers only need to negotiate with them, without entering the market to deal with fragmented data providers. The technical support intermediary provides technical consultation to the buyer, including database and modeling, etc.

It can be seen that the development of the alternative data market has been well established, and a rich division of labor and cooperation has been formed, but this market is still very opaque and non-standardized. This is a common problem faced by current data transactions. What cannot be ignored is illegal data transactions, such as the "data black market" and "data black production" that trade personal privacy data. Since 2019, my country has launched a centralized rectification of "data black production".

How to establish a compliant and effective data element market? A viable option is to use cryptography, including Verifiable computing, Homomorphic encryption, and Secure multi-party computation (PlatON, 2018).

For complex computational tasks, verifiable computations generate a short proof. As long as this short proof is verified, it can be judged whether the computing task has been executed accurately, and there is no need to repeat the computing task. Under homomorphic encryption and secure multi-party computation, when data is provided externally, it is in ciphertext instead of plaintext, so that the data is exclusive. These cryptography technologies support the confirmation of data rights, making it possible to trade data usage rights without affecting data ownership, thereby building the property rights basis for data transactions and affecting the economic interest relationship between data subjects and data controllers. Blockchain technology is used for data storage and authorization, and also plays an important role in the definition of data property rights. As will be discussed later, in addition to technology, the definition of data property rights can also be implemented through institutional design.

But even so, the market for data elements based on cryptography is different from traditional markets. First of all, the same data can be provided to multiple parties at the same time after being encrypted, so it is still non-competitive, unless a confidentiality agreement is signed between the data user and the data controller, requiring the latter not to provide the data to others for use, or the data has Very time-sensitive, it loses value quickly after use. In other words, it is difficult for data to become a private product, so it is difficult to participate in market transactions like a private product. Second, the value of the same data to different people can vary greatly. This makes it difficult to extract valuable pricing information from data transaction prices in the data usage rights transaction based on cryptography. Therefore, the data element market based on cryptography will not adopt the element allocation mode of "multiple buyers bid for the same commodity, and the one with the highest price wins".

It should be noted that the data element market is not necessarily a simple matchmaking model, and there may be other complex models. For example, Markit (merged with HIS in 2016 to form HIS Markit) is worth studying the mode of establishing CDS (Credit Default Swap, credit default swap) pricing data service. Before the international financial crisis, CDS was a purely over-the-counter transaction, and the information disclosure was very imperfect. CDS positions are important commercial secrets of financial institutions and are difficult to share with other financial institutions. Financial institutions participating in the CDS market only know their own CDS positions, but not the overall situation of the market. There is no good index in the CDS market, and the degree of information asymmetry is high. Founded in 2003, Markit's shareholders include major CDS market makers. Shareholders of these financial institutions upload their own CDS data to Markit, and Markit integrates the CDS market data and provides it to the outside world for a fee, including pricing and reference data, index products, valuation and transaction services. Without revealing their commercial secrets, Markit's shareholder financial institutions not only learn about the overall situation of the CDS market from Markit's work, but also obtain investment income from Markit's business growth. Although Markit does not explicitly price data, it solves the problem of incentive compatibility in data sharing through the benefit binding function of equity and the "1+1>2" effect of data integration. This is a complex and ingenious data transaction model. Baihang Credit Information Company in my country's personal credit information market can also be understood under a similar framework.

(5) Definition of data property rights

From the data element configuration mechanism introduced earlier, it can be seen that the definition of data property rights is the basis for the effective configuration of data elements. Data property rights are mainly divided into ownership and control rights. Data control includes who can use the data, how to use the data, and whether the data can be further shared externally. In corporate governance, ownership and control are unified—shareholders own the company, and the general meeting of shareholders is the company's highest authority. But the ownership and control of data can be separated, especially for personal data whose ownership is not clear. Data property rights can be defined by technologies, such as cryptographic technologies such as verifiable computing, homomorphic encryption, and secure multi-party computing. Data property rights can also be defined through institutional design.

In May 2018, the European Union began implementing the General Data Protection Regulation (GDPR). GDPR grants data subjects extensive powers: first, the right to be forgotten, which means that data subjects have the right to request the data controller to delete their personal data in order to prevent personal data from being disseminated. Second, the right to portability means that the data subject has the right to request personal data from the data controller and decide on its own use. Third, the data subject authorizes the data controller to process personal data voluntarily, based on a specific purpose and in balance with the status of the data controller, but the authorization does not have permanent legal effect and can be withdrawn at any time. Fourth, the processing conditions for special categories of personal data, such as medical data.

secondary title

Four. Summary

This paper conducts a preliminary study on the characteristics, value and configuration mechanism of data elements, and the main conclusions are as follows.

As a basic but complex concept in information science, data cannot be understood without the analysis of related concepts such as information and knowledge, and the DIKW model provides a suitable analytical framework for this. According to the DIKW model, there is an affiliation relationship from narrow caliber to wide caliber among wisdom, knowledge, information and data. Data is a product of observation. Data is processed through cognitive processes to obtain information that gives answers to questions about who, what, where, and when. Knowledge is the application of data and information to give answers about how to do it (How). Wisdom has a distinct meaning of value judgment, and it is related to the prediction of the future and value orientation on many occasions.

Data has technical characteristics of multiple dimensions, but the economic characteristics of data are more complex. Data can generate value, so it has asset attributes. Data has the characteristics of both goods and services. Much data is a public product and can be freely used, transformed and shared by anyone for any purpose. Because most data is non-rivalrous, less data is private goods and public resources. Ownership of data is a complex issue both legally and practically, especially for personal data. Therefore, comparing data to oil is more appropriate than comparing data to sunlight.

After data is processed and integrated with other data, it is analyzed to form actionable insights, and ultimately action generates value. The value of data is reflected in the improvement of user utility at the micro level, and at the macro level as the improvement of total factor productivity by the information, knowledge and wisdom extracted from the data. However, the lack of objective measurement standards for data value is mainly due to three reasons: first, the value of the same data to different people can be very different; second, the value of data changes over time; third, data will generate externalities.

The measurement of data value includes absolute valuation and relative valuation. Absolute valuation of data is difficult, and there is no generally accepted method. At present, the industry mainly uses cost method, income method, market method and questionnaire test method, but all of them have defects. Data relative valuation is given a set of data and a common task, and evaluates the contribution of each set of data to the completion of the task. Relative valuation is simpler than absolute valuation. For quantitative data analysis tasks, Shapley values can be used for relative valuation.

Data comes in many types and with different characteristics, resulting in different configuration mechanisms. These configuration mechanisms are dedicated to the information asymmetry and incomplete contract problems in the configuration of data elements. This article discusses four configuration mechanisms.

First, data as public goods are generally provided by government departments using tax revenues. Government departments should open government data to the society and the market as much as possible on the premise of not involving confidentiality, so as to maximize the public value of government data.

Second, if data as a quasi-public product has clear ownership and exclusivity, club product-style payment models, open banking models, and data trust models can be adopted.

Third, in the Internet economy, the ownership of a lot of personal data is difficult to define clearly. In reality, the PIK (Pay-in-kind) model is common, in essence, users exchange their attention and personal data for information and social services, but There are many disadvantages in PIK mode.

Fourth, many data are not suitable for market transactions because they are non-exclusive or non-competitive. In other words, market-oriented allocation is not equal to market transaction mode. In reality, there is no centralized and liquid data element market. Although peer-to-peer data transactions (similar to over-the-counter transactions) have been happening, they are very opaque and non-standardized, and illegal data transactions are a problem that cannot be ignored.

In addition to technology, data property rights can also be defined through institutional design. GDPR introduces a fine-grained dimension of data property rights, including the right to be forgotten, the right to portability, conditional authorization, and the principle of minimal collection, etc., and establishes an institutional paradigm for data management. These practices have been adopted by many countries and regions outside the EU. The core issue of personal data management is privacy protection. For personal data, control and privacy are more important than ownership.

references

Ackoff, R.L., 1989, “From Data to Wisdom”, Journal of Applied System Analysis, 16: 3-9.

Acquisti, A., C. Taylor, and L. Wagman, 2016, "The Economics of Privacy", Journal of Economic Literature, 54(2): 442-292

Bennett Institute for Public Policy (BIPP), 2020, The Value of Data,

Jia R., D. Dao, B. Wang, F. Hubis, N. Hynes, N. Gurel, B. Li, C. Zhang, D. Song, and C. Spanos, 2019, "Towards Efficient Data Valuation Based on the Shapley Value".

PlatON, 2018, "PlatON: A High-Efficiency Trustless Computing Network",

Rowley, J., 2007, “The Wisdom Hierarchy: Representation of the DIKW Hierarchy”, Journal of Information and Communication Science, 33(2): 163-180.

references

Deloitte and Ali Research Institute, 2019, "The Road to Data Capitalization - Valuation and Industry Practice of Data Assets"

Li Xiaojia, 2020, "Call for the Establishment of a "Data Element Industrialization Alliance", Hong Kong Stock Exchange

产业

Welcome to Join Odaily Official Community

Subscription Group

https://t.me/Odaily_News

Chat Group

https://t.me/Odaily_CryptoPunk

Official Account

https://twitter.com/OdailyChina

Chat Group

https://t.me/Odaily_CryptoPunk