Designing a smart city sustainability assessment information system based on the adapted SULPiTER methodology

: Increasing number of smart cities, the rise of technology and urban population engagement in urban management, and the scarcity of open data for evaluating sustainable urban development determines the necessity of developing new sustainability assessment approaches. This study uses passive crowdsourcing together with the adapted SULPiTER (Sustainable Urban Logistics Planning to Enhance Regional freight transport) methodology to assess the sustainable development of smart cities. The proposed methodology considers economic, environmental, social, transport, communication factors and residents’ satisfaction with the urban environment. The SULPiTER relies on experts in selection of relevant factors and determining their contribution to the value of a sustainability indicator. We propose an alternative approach based on automated data gathering and processing. To implement it, we build an information service around a formal knowledge base that accumulates alternative workflows for estimation of indicators and allows for automatic comparison of alternatives and aggregation of their results. A system architecture was proposed and implemented with the Astana Opinion Mining service as its part that can be adjusted to collect opinions in various impact areas. The findings hold value for early identification of problems, and increasing planning and policies efficiency in sustainable urban development.


Introduction
The sustainable development of smart cities encompasses several essential facets and strategies designed to enhance quality of life, the efficiency of urban services and employment, and sustainability across economic, social, environmental, and cultural dimensions.
As defined by the United Nations Economic Commission for Europe (UNECE), a smart sustainable city is an innovative city that uses information computing technologies and other means to achieve these goals.Technologies such as the Internet of Things (IoT) and artificial intelligence (AI) play a key role in the sustainable development of smart cities.They help to reduce energy costs and greenhouse gas emissions, improve the safety and well-being of residents, and promote smart solutions to urban development problems.However, the development of decision support systems that assess multi-level aspects of the smartness of city decisions and recommend the optimal set of improvement strategies is a key part of this approach.
Numerous rankings assess smart city sustainability.Examples include the Global Smart City Performance Index (GSCI), Smart City Index, University of Navarra economic, environmental, social, transport, and communication factors, as well as residents' satisfaction with the urban environment.This multidimensional representation helps to create a more holistic and realistic definition of a smart city.Additionally, we reduce social barriers for certain groups of citizens, who, due to these barriers, cannot influence decision-making, by applying passive crowdsourcing.Through the analysis of social networks, we obtain data from a wide range of people, including those who are traditionally limited in direct participation in discussions, debates, and decision-making processes.
The second significant gap in research concerning the assessment of smart city sustainability is the issue of data management.Data management is a key component in the development of sustainable smart cities (Franke and Gailhofer, 2021).According to the authors of this work, market-oriented approaches, whether based on the trade of private data or on the free use of open data, are unlikely to stimulate sustainable innovations by themselves and require additional regulatory intervention in the processes of data collection, access, and control over their use, as these processes have far-reaching consequences for decision-makers.The overall task of 'sustainable development' can be characterized as achieving a balance between economic, social, and environmental goals and resolving the inevitable conflicts between these goals (Vaidya and Chatterji, 2020).In the context of a 'smart city,' this particularly includes decisions about data management, i.e., who can use what data and for what purposes.Our research is currently focused on aspects of data collection and analysis, but we recognize that without such regulations, any data derived from open or proprietary sources can become a tool in a conflict of interests.
The paper is organized into sequential sections-related research, methodology explanation, system architecture, results, discussions on the findings, and the final conclusions-providing a structured overview of this cutting-edge study.

Related works
The interest in achieving sustainable development goals encourages the development of urban sustainability approaches.The following are some of the significant works considering indicative assessment tools for a comprehensive assessment of the sustainability of urban development.Xing et al. (2009) developed a model for assessing the sustainability of urban development (UD-SAM), which allows decision-makers to determine sustainability indicators (economic, environmental and social) and can lead to a more holistic assessment of urban environment elements impact on sustainability.UD-SAM is based on the Sustainability Assessment (SAM) model originally developed in the oil industry.This work is relevant to our research, because it also describes the adaptation of the oil industry sustainability index to assess the sustainability of urban development.It also uses the expert assessment method (questionnaire) to assign importance to indicators of sustainable development within the index.
Nyusuppova et al. ( 2022) calculated the integral sustainable urban development index (SUDI) index, (a methodology for assessing the sustainability of cities proposed by Sustainable Growth Management (SGM) Agency), based on twenty-seven statistical indicators processed in the spatial geodata database of cities in Kazakhstan for 2007-2019.According to the results, cities of Kazakhstan are defined as cities with a middle level of sustainability.Relevance of this research is an integral index of sustainable urban development, covering economic, environmental and social blocks of indicators to assess the sustainability of Kazakhstani cities and facing related constraints.In the context of advancing urbanization and the growing integration of technology into urban environments, sentiment analysis plays a crucial role in understanding public perception and emotions related to smart city initiatives.Below, we review several key works that explore sentiment analysis frameworks, methodologies, and applications within the smart city domain.The study by Ahmed et al. (2016) investigates the challenges and opportunities in sentiment analysis frameworks, architectures, and applications within the smart city domain.This work holds relevance to our research, as it underscores the critical role of big data in establishing the information foundation of a smart city.This foundation necessitates the seamless integration of various city components, such as pollution sensors, road systems, social networks, and smartphones, all of which generate copious amounts of diverse data-whether public or private, structured or unstructured, streaming or static.The authors primarily focus on textual data for sentiment analysis but acknowledge the potential of analyzing photo data (e.g., selfies with city backgrounds) and video data.They propose an architectural framework for sentiment analysis tailored for smart city management.This framework facilitates the progressive transition from sentiment analysis of text data to the analysis of multimodal data.To achieve this, they leverage the Apache Hadoop big data ecosystem, incorporating its Spark engine, along with R and Python code within Hadoop.Alotaibi et al. (2019) delve into the challenges of sentiment analysis for short messages, specifically tweets, in Arabic.The authors center their attention on sentiment analysis for Saudi Arabian Arabic, given its unique characteristics.Despite Saudi Arabia boasting the fourth-largest number of Twitter users globally as of January 2019, research in this domain remains limited.The authors outline a fourmodule architecture for their sentiment analysis system, encompassing data collection, preprocessing, tweet classification, and the sentiment analysis phase.Furthermore, they provide a suite of tools, libraries, and resources to facilitate sentiment analysis in the Saudi dialect of Arabic.Arku et al. (2022) present an extensive study on the utilization of sentiment analysis of Twitter messages in the context of smart cities.The paper summarizes the experiences of four African smart city projects, revealing generally positive sentiments associated with these initiatives.However, the authors emphasize the influence of 'smart city mirages,' wherein branding campaigns effectively divert public attention from existing urban challenges and their attendant negative socio-economic consequences.This work contributes to the broader global discourse regarding citizen engagement in smart city processes, with implications for urban planning and management practices.Hilal et al. (2022) explore the role of sentiment analysis as a tool to address health crises in smart cities, particularly pertinent in light of the recent COVID-19 pandemic.The authors introduce a novel methodology, Artificial Intelligence Based Sentiment Analysis for Health Crisis Managementc (AISA-HCM), designed to discern emotions in Twitter user data.The methodology involves a series of preprocessing steps, including denoising, tokenization, normalization, and stemming.Subsequently, data is fed into the BSO-DBN (Brain Storm Optimization with Deep Belief Network) model for feature extraction.The BAS-ELM (Beetle Antenna Search with Extreme Learning Machine) model is then employed for data classification, determining sentiment class labels.The authors report that their AISA-HCM method exhibits superior performance when compared to state-of-the-art sentiment analysis approaches.
In another advanced sentiment analysis model for smart city residents' reviews, proposed by Sundararajan et al. (2017), a bidirectional encoder representations from transformers-deep convolutional neural network BERT-DCNN model is introduced.This model utilizes BERT as a pre-trained language model for generating embeddings and employs parallel layers of augmented convolutional neural network in conjunction with a global middle pool layer for fine-tuning.The BERT-DCNN model is designed to handle dimensionality reduction, accommodate dimensionality increases without significant information loss, capture long-term dependencies using different expansion rates, and incorporate a knowledge base for sentiment analysis at a conceptual level.Experimental results demonstrate the superior performance of this model when compared to other machine learning models.
The aforementioned studies provide insights into the integration of citizen sentiments into the assessment of smart cities sustainable development criteria.Furthermore, there exists a body of literature that delves into the methodology for constructing such assessments, and we now introduce the most noteworthy contributions in this domain.
The paper (Ismagilova et al., 2019) provides a comprehensive review of the information systems (IS) focused aspects of smart cities, covering multiple dimensions and themes.The authors of the article reviewed over 100 papers dedicated to the information systems of smart cities and noted that the focus of research has shifted from technological aspects to a holistic perspective, emphasizing factors such as citizen participation, quality of life, and sustainability.However, most of the articles are based on simulations and surveys rather than on specific case data.From a geographical standpoint, the majority of these studies were conducted in Spain, the USA, India, the UK, and Italy.The technologies discussed include the Internet of Things (IoT), cloud computing, and Bluetooth, which are often mentioned in the context of smart cities.Thus, our work fills the following gaps in this series of research: 1) we describe a specific case; 2) we describe Kazakhstan; 3) our technologies include not so much the Internet of Things, but social mining.

SULPiTER methodology
The SULPiTER tool is a decision support system for policymakers to facilitate the process of developing alternative scenarios for urban logistics, based on the division of the city into conditional functional areas (FUA) (D 1.3.2Think tank transnational platform on fright mobility planning in FUAs: vision document and setup (2017)).
Software Tool.Handbook for users, 2017).The tool provides a clear picture of the distribution of urban freight traffic in each FUA and includes a modeling system for evaluation using performance indicators-the Logistics Sustainability Index (LSI).
The application of the system as a decision support tool consists of three main stages (Figure 1).The SULPiTER project is designed to adapt the criteria for the sustainable functioning of the urban system, one of which is the Logistics Sustainability Index (LSI).SULPiTER methodology provides a generalized framework to compute the index value and a procedure to set the parameters of the algorithm based on the opinions of users.The LSI calculation process consists of six stages (Figure 2): Step 1: Selecting the impact area.There are seven impact areas, and the user chooses for which the assessment will be performed.Logistics sustainability is calculated based on the aggregate performance indicator of all cargo-related activities, including seven impact areas: • user perception.Each of the impact areas consists of several criteria, which are divided into indicators (Figure 3).The indicators are accompanied by explanations.The explanations may include units of measurement, specific numerical indicators, recommendations for evaluating the numerical value of the indicator, as well as recommendations for using the Likert scale from 1 to 5 to assign an expert evaluation value to the indicator.Step 2: criteria selection.Criteria should be selected considering the impact areas, considering the stakeholders' point of view.In this study framework urban environment users are considered the only stockholder category.
Step 3: selecting indicators.At this stage, the user selects the final indicators from the list that is provided for each criterion (and impact areas).
Step 4: weighing process.Users' opinions are used to assign weights (preferences) for impact areas, criteria, and indicators.One of several most common weighing methods can be used particularly the assignment of weights based on a pair comparison within the corresponding impact area.
The user is asked to indicate the importance (or preference) of element 1 compared to element 2, rating it on a scale from 1 to 9, where: 1 = the same; 3 = moderate; 5 = very; 7 = much more; 9 = exceptionally more.
All intermediate integer estimates are possible.If element 1 is less important than 2, then the corresponding inverse value is assigned (for example, 1/5).
The matrix A (n × n), called the "comparative" or "inverse matrix", is filled in by the expert, where n is the number of elements being compared.The cells under the single diagonal cells are filled in accordance with the input values of the experts' rating, while the others below are equal to the inverse of the input value.
However, due to the large number of indicators to be weighed, use of the pair comparison method in this study does not seem effective.In this regard, the expert evaluation method was used for each criterion (i.e., a group of indicators) on the Likert scale, which is not excluded by the calculation method.
is normalized indicator value for option  and indicator ; is the maximum value of the indicator.
Step 6: Logistics Sustainability Index (LSI) calculation.LSI calculation is summation of products of criteria's normalised values and their normalized weights for the study period, usually a year: where,   is logistics sustainability index, assessing the effectiveness of impact area ;   is normalized value of the indicator  with a minus or plus sign, depending on its contribution; is weight of indicator m.
The overall logistics sustainability index is calculated as a weighted sum of   : where   are weights of the impact zone.
Comparing LSI values for different periods can be used to assess city logistics sustainable development in dynamics as well as to judge the effectiveness of measures and projects taken in the field of urban logistics sustainability.Visualization of the final logistics sustainability index (LSI) is displayed in Figure 4. Since the LSI is originally aimed to assess urban logistics sustainability, we supplement its structure with criteria and indicators reflecting the degree of urban sustainable and smart development in general.In this regard, the most widely used indices of sustainable and smart urban development were reviewed (Table 1).The criteria for indicators inclusion were: • accordance of assessment subject; • criteria characteristics; • availability of similar data in official sources in Kazakhstan.
A review of the sustainable development indices showed that the data sources for calculating the indices in Table 2 are: • statistical data; • results of empirical data collection, experiments, and observations; • sociological surveys of the urban population; • secondary data from reports of research organizations and businesses.Further, the availability of required or similar open access data on Kazakhstani cities was reviewed.The limitation is that the search is carried out exclusively by online sources.The search was carried out by keywords for each of the indicators.
Based on the indicators review "Smart Government" impact area was added to the basic list of impact areas for LSI calculation.The structure of the criteria for assessing this impact area is presented in Table 2.The authors analyzed the criteria included in the field of "smart government" and systematized the most widely used indices for assessing urban sustainability based on guidelines for calculating urban sustainability indices, as well as reports from a number of cities, different in size, population, with different levels of economic and social development.The results of the analysis are presented in the article "Content and structure of indices for assessing sustainable urban development: prospects for application in Kazakhstan," (Sadykova et al., 2022).
Another aspect to consider is the city scale when defining criteria within the scope of smart government.Some sustainability indices lack comparisons between megacities and smaller cities.The study assessed the alignment of data from national statistics, international studies, and open sources with indicators in The China Urban Sustainability Index, European Green Capital, and Citizen Centric Cities Sustainable City Index.Correspondence analysis revealed that the Citizen Centric Cities Sustainable City Index aligns most closely with available data for calculating Kazakhstan's city sustainability index.It's recommended to establish systematic monitoring of criteria related to "smart government" or similar indicators to track and enhance the sustainability of Kazakhstan's cities over time.

Data and datasets
Collection, processing, and analysis of requests from city residents through social networks, as well as sentiment analysis of comments and discussions in thematic communities (groups, channels, and forums) of social media is a relevant research method for studying public opinions in decision making, including when implementing the concept "Smart City" (Yue et al., 2022;Malik et al., 2022).According to Statcounter Global Stats (Ranking.kz,2022), the most popular social media platforms in Kazakhstan are Facebook, Instagram, YouTube, Pinterest, and Telegram.Therefore, official newsgroups about Astana and media accounts on Facebook, Instagram, and YouTube, as well as popular Telegram channels, were chosen as data sources for the collection.
Thus, the main data for the analysis were text data (comments, posts, messages) generated by users.In this project, we adhered to the best practices for protecting personal and sensitive data from disclosure (Loukides et al., 2018;Mohammad, 2022;Rambocas and Pacheco, 2018).Therefore, all collected comments and messages were used without user IDs, and messages that contained the personal data of citizens were deleted using named object recognition and confidential information detection algorithms.Since comments in channels are almost not moderated and can be irrelevant, these messages were automatically processed and cleaned up (Garg and Sharma, 2022;Omuya et al., 2023).
Web crawlers were used for data collection, and natural language processing pipelines represented by libraries in the Python language were used for data processing (Altinok, 2021;Kousis and Tjortjis, 2021).Then, through the created pipeline, the processed data went through the stage of extracting features and words, that is through the stage of detecting message topics, by keywords that were determined in accordance with the topic of the smart city.About 90% of the messages from YouTube and Telegram were found to be irrelevant or spam (see Figure 5).A preliminary analysis of all posts showed that about 80% of user comments were in Russian, about 15% were defined as mixed or foreign, and more than 5% of comments were written in Kazakh.Ready-made pre-processed and cleaned prepared texts were the subject of intellectual analysis to identify valuable information for calculating criteria for the sustainable development of a smart city.
7934 messages were collected from other sources (Facebook, Instagram).The small number of messages is explained by the complexity of collection and blocking of automatic collectors by these platforms, as well as legal restrictions on access to user data entered by Meta (Meta, 2021).

Methods for indirect calculation of SULPiTER indicators based on passive crowdsourcing
Building upon the general taxonomy of areas of responsibility, criteria, and indicators for sustainable development in smart cities discussed earlier, the system currently identifies five indicators that can be indirectly computed using passive crowdsourcing technologies.For each of these indicators, a seed keyword dictionary has been created to select relevant user-generated messages.If a message contains words from multiple dictionaries, the principle of a simple majority is applied.In other words, the message is categorized under the indicator whose dictionary contains the most words from that message.For example, for "Satisfaction with personal housing situation" indicator we use the following keywords: apartment, rent, housing, district, tenants, rented housing, owner, communal services, view, infrastructure, heating networks, water supply etc.For "satisfaction with access to the educational system" indicator we use the following keywords: school, kindergarten, development, center, child, children, kid, skills, courses, classes, etc.For "engaging urban space" we use the following keywords: bus, taxi, bicycle, ice, on foot, pedestrian, transport, route, cargo, stampede, waiting, passenger, railway, airport, timetable, stop, parking, driver, ticket, jams, traffic etc.
Message classification under specific indicators hinges on both metadata and header analysis.For example, kindergarten performance reviews are tagged as 'educational system access satisfaction,' whereas feedback on related clinics falls under 'healthcare accessibility' In instances of classification conflict, the dictionarybased approach overrides.Notably, the system architecture accommodates enhanced algorithms for future refinement.Sentiment analysis is conducted using a three-tier scale-'1' for positive, '0' neutral, and '2' negative-with the resulting scores influencing the overall indicator value ranging from −1 to 1, determined by the volume of positive, neutral, and negative messages, as outlined in Figure 6: Choosing the right sentiment analysis model is crucial and depends on data scale, available resources, and analysis goals to ensure maximum accuracy.Pre-trained models like BERT and Generative Pre-trained Transformer (GPT) are ideal starting points due to their precision and adaptability in various natural language tasks.Finetuning these models with specific datasets can further enhance their relevance to sustainability evaluation contexts.In our study, we used the Blanchefort/rubert-basecased-sentiment-rusentiment model to analyze reviews.This model, trained on the RuSentiment dataset featuring 30,521 Russian social media comments, achieved 79% accuracy.This performance aligns with the anticipated 70%-82% accuracy range for foundational deep transfer learning models in Russian sentiment analysis.The reliability of assessments can be compromised when there is an inadequate representation of citizens' opinions.Negative opinions tend to be published more frequently than positive ones, as negativity often garners attention to issues, while positivity is often perceived as not influencing change when things are considered good already.To rectify this imbalance, a correction factor, denoted as k1, can be introduced into the indicator's calculation.This factor, set to be greater than 1, amplifies the influence of positive reviews compared to negative ones, ensuring a more balanced assessment.Furthermore, a separate correction factor, k2, with a value of 0.5, can be incorporated into the formula to account for the significance of neutral reviews.This approach aims to provide a more equitable analysis, considering the complexity of interpreting neutral feedback.
To make the model's decision-making process clearer and explain the overall score derived from individual opinions, we can use integrated gradients, a technique from (Sundararajan et al., 2017).This approach helps show how much each input feature influences the final decision by assigning values to them.It's sensitive to input changes and doesn't depend on the neural network's specific setup.By applying this method to each review, we can see what drives the model's decisions.
We suggest pairing this with the Valence Aware Dictionary for Sentiment reasoning (VADER) method from Hutto and Gilbert (2014), which uses a dictionary of words labeled as positive or negative.Traditionally, an expert creates this dictionary, and reviews are scored by the number of positive, neutral, and negative words they contain.However, in our case, we're creating this dictionary automatically, identifying the most impactful words for positive, neutral, and negative comments through integrated gradients.

Formalization of problem domains using computational models as a basis for building intelligent systems
Let us consider the key requirements for the architecture of an intelligent information system for assessing the sustainability of a city's development.Firstly, in the process of the system's operation, new mathematical models and methods of data analysis and processing will be developed in related subject domains (economics, logistics, ecology, energy, etc.).Thus, specific subject models should not be rigidly embedded in the system.It should be possible to integrate new algorithms for analysis, modeling and forecasting into the system.
Secondly, the sources of information in crowdsourcing are heterogeneous, incomplete and inaccurate: the data coverage (by time, social categories of citizens, etc.) is incomplete, data formats vary, Internet resources disappear and appear, the reliability of data may be questionable, etc.This makes it necessary to replace models and data collection and processing modules in the system, provide an opportunity to fill in information gaps through modeling.
The above makes it necessary to employ some systematic approach to the organization of the collection of models used by the intelligent system.As a theoretical basis for its construction, the mathematical apparatus of the theory of synthesis of parallel programs on computational models was adapted and applied.The key concept of the theory is a computational model-a set of triplets of the form <in, mod, out>, where "in" and "out" are sets of variables of the problem domain, and "mod" is a computational module capable of calculating the values of output variables (out) from input ones (in).Within the framework of this model, the computation of a criterion value from the values of the corresponding indicators can be represented by such a triplet, where "in" would be a set of indicators, "out" being the criterion, and "mod" being a procedure (subroutine) that computes the corresponding weighted sum.Together, triplets allow the system to automatically perform the computational process from the source data to the value of the target index.
The essential property of computational models as a basis for building an intelligent system is that a set of modules can be redundant in the sense that it can contain many alternative ways to compute the same variable, while different alternatives may be preferable depending on the specific circumstances of the computation.The way to compute a variable will be selected by the system automatically based on a formal specification of operations describing their essential properties.This approach to building information systems is discussed by the authors in more detail in (Gorodnichev and Lebedev, 2021).The initial mathematical apparatus of computational models has been adapted for the purpose of creating an intelligent information system.In particular, the original model, implying a single assignment of variables, has been expanded with the possibility of multiple assignments and subsequent recomputation of dependent variables.Also, the computational model concept was extended by addition of non-functional attributes description, which allows describing and controlling the degree of reliability of calculations.

Results of LSI* calculation according to the adapted SULPiTER methodology
The original SULPiTER methodology for assessing the sustainability of urban logistics has been adapted for use as a tool for assessing the sustainability of urban development.The groups of indicators for which the LSI index was calculated are supplemented with indicators characterizing sustainable urban development from the existing indices of assessment of sustainable urban development and smart city assessment indices that are closest in structure.We called the adapted index LSI*.The Smart Government impact area has been added to supplement the index.The weights representing the importance of indicators were determined via a survey of residents of the city of Astana.In this study, they are considered main stakeholders with experience in the use of the city.The calculation was made on the basis of 29 confirmed questionnaires.Based on the weights assigned by users, sub-indexes for the creative evaluation in the LSI* index structure is calculated.Visualization of the results of the assessment of the sustainable development of Astana using the LSI* index for 2019 and 2022 is shown in Figure 7.The results demonstrate growth of the adapted LSI* in the period from 2019 to 2022.The increase of indicators' value is mainly due to increase of values of criteria within the "economy and energy" impact area, while the "environment" impact area, on the contrary, shows rapid decline.A slight positive trend is also observed for the impact zone "society".

Data mining collection and algorithm results
Thus, we describe 3 indicators of sustainable development which were analyzed: satisfaction with personal housing situation, satisfaction with access to the educational system and engaging urban space.For each of the indicators, 4651, 1246 and 831 reviews were processed, respectively.As a result of the calculation, we obtained the following indicator values (see Table 3).The work of the model was interpreted by observing how it behaves in individual specific cases.To do this, the integrated gradient method was used in the model output code.This method indicates which words had the greatest impact on causing the model to label a sentence as positive, negative, or neutral.However, this method specializes in explaining individual predictions.To give an aggregate view of the worldview of the sentiment model, we made our own solution-we used VADER-a rule-based (or lexicon) method to evaluate our model.To do this, a list of the three most important words for each prediction is compiled.This is done for every sentiment and sustainability indicator.VADER then categorizes the sentiment of each word based on predefined word dictionaries associated with sentiment scores or categories.As a result of the calculations (see Figure 8), we can say that there are sufficient grounds to consider the indicator values reliable.In all indicators, there is a predominance of the percentage of neutral messages, which can be explained by the fact that VADER does not consider phrase dictionaries, only individual words.In addition, the positive and negative mood classes contain their respective word fractions.That is, the class of 'negative reviews' is characterized by a significantly higher proportion of negative words across all indicators, while the class of 'positive reviews' is characterized by a much higher proportion of positive words, with a tendency to minimally present negative lexical elements.In general, the level of residents' satisfaction with housing conditions, the quality of education and urban space remains below average.

Intelligent system construction results
Based on the proposed approach, the information system was designed and implemented (Figure 9).The system is based on a subsystem of active knowledge, called Aka.This subsystem contains a description of a computational model describing the computational processes of the information system, as well as an interpreter of computational models capable of performing computations based on this description.In the simplest case, the variables of the computational model are indicators, criteria, spheres of influence and the sustainability index (the modified LSI*).To perform computations on the computational model, the values of indicator variables are set, and the index variable is marked as the demanded one.The operations of the computational model determine the formulas for computing criteria from indicators, spheres of influence from criteria, and so on.The interpreter of the computational model uses this description to compute the desired index by executing the operations in order.The top level architecture consists of the following main components: web crawler and source data monitors, data storage, backend server, web interfaces and external high performance computing hardware.Web crawlers and source data monitors are services, which scan, gather and upload raw source data to the system.There is also a possibility for a user to upload new source data via the user interface manually.The data uploaded is preprocessed and stored the persistent storage database subsystem.The database subsystem may comprise multiple storages under different database management systems (Structured Query Language (SQL)), Not Onely Structured Query Language (noSQL), etc.) that are accessed via uniform webservice application programming interface (API).The backend server performs data processing and provides access via web-based interfaces to three categories of users: system administrators, knowledge base engineers and users.A knowledge base engineer maintains the knowledge base, represented as a set of computational models, as well as data processing modules.A user accesses the current LSI* estimation and explores its details, intermediate and source data.Also, a user is able to formulate simulation requests and observe the results (the 'what if' case).Data processing is carried out by the Aka subsystem based on the knowledge base.External high performance computing resources may be employed for processing.The Aka subsystem consists of a knowledge base (computational models and data processing modules), a derivation and execution engine and a jobs manager.
The system architecture includes a user interface implemented on the basis of web technologies.The server part of the information system, in addition to the active knowledge subsystem Aka, contains a PostgreSQL database management system (DBMS) that stores source and intermediate data.It also contains crowdsourcing services that supply data to the database.The execution of the computational model operations can be carried out both on the server where the Aka subsystem is located, and on external computing resources-other computing servers or clusters.
The system can operate in two modes: continuous monitoring and batch simulation.The analysis of the current situation is carried out in continuous monitoring mode.The initial data is dynamically updated by crowdsourcing services, which leads to periodic updates of intermediate and final indicators presented in the user interface.The simulation mode is a batch mode of operation, when a set of source data is transmitted to the input system, and it computes the index value.This mode is designed to simulate hypothetical situations or situations that have occurred in the past.The initial data are modified or recorded in the past data on the state of the city.
In both modes of operation, the result of the computation is not a single numberthe index value, but rather a computation tree, the root of which is the modified LSI index, and the nodes and leaves of which are spheres of influence, criteria, indicators, as well as initial and possibly intermediate computation data.This allows the user interface to detail hierarchically how and from which data the indicator was computed.

Discussion
The proposed adapted SULPiTER methodology and the Astana Opinion Mining macro-service can enhance the assessment of sustainable development in smart cities when some data are not available or difficult to collect.Multiple international sustainability indices are regularly computed for Kazakhstan, aiding the observation of sustainable development changes and the formulation of policy recommendations.Nonetheless, during index calculations, challenges related to politics and data arise, including data disparities between regions within Kazakhstan and in comparison with foreign countries, complicating the adaptation of foreign methodologies.
The study advances the integration of passive crowdsourcing with computational models for more dynamic and responsive urban planning.For example, from Figure 7 above it can be implied that "transport and mobility" "environment" impact area needs a careful understanding of the reasons for the deterioration.Our study can contribute to the broader field of sustainable urban development and has potential implications for policy-making, particularly in areas of environmental sustainability, urban planning, and citizen engagement.The model can be adapted to other urban environments and could serve as a template for other cities of Central Asia seeking to improve their sustainability assessments.
The utilization of a sustainable development index enables meaningful comparisons with cities sharing similar socio-economic and environmental attributes.This enhances the efficient adoption of successful global practices in sustainable urban development, considering policy recommendations.To effectively use sustainability indices, synchronized data collection and interdepartmental cooperation are essential.The experience from foreign countries, like China, where city sustainability indices are systematically calculated, demonstrates that positive trends in vital sustainability indicators, including industrial restructuring, smart transportation and green space development, adherence to quality standards, environmental and social monitoring, and resource conservation, result from consistent and targeted actions based on index outcomes.
Limitations of the study might include the challenges of data heterogeneity and reliability in passive crowdsourcing, the scale of data analysis, or the generalizability of our findings to other urban settings.The potential biases in crowdsourced data and challenges in interpreting large datasets including limitations of the current computational models can be considered as constraints in generalizing findings to different urban contexts, considering the unique socio-economic and environmental characteristics of cities.A small number of confirmed questionnaires might be one of the drawbacks of the proposed results.
Areas for future research could involve refining the computational models, exploring additional data sources or indicators that could enrich the assessment model, or applying the system in different urban contexts to validate and enhance its applicability.Further developments might include testing the methodology in different cities or for other aspects of urban sustainability, to validate and refine its applicability or adapting the system to track evolving urban sustainability goals.The role of ongoing research and innovation in addressing the complexities of urban sustainability and in contributing to more livable, resilient smart cities is very important.

Conclusion
In the context of rapidly evolving smart cities, where technology is deeply intertwined with urban life, understanding and harnessing the sentiments of city residents are essential aspects of sustainable development.The integration of sentiment analysis methodologies into the evaluation of sustainable development criteria offers a promising avenue for gaining insights into public perceptions and emotions regarding smart city initiatives.Various studies have explored sentiment analysis frameworks, architectures, and applications within the smart city domain.These investigations have highlighted the significance of big data, diverse data types, and emerging technologies in shaping the information foundation of smart cities.From textual data to multimodal data, sentiment analysis has demonstrated its adaptability in gauging public sentiment.Moreover, the adoption of advanced computational models, including machine learning and deep learning techniques, has further enhanced the accuracy and efficiency of sentiment analysis systems.Beyond technical advancements, the studies have also emphasized the importance of citizen engagement and participation in smart city processes.While there are generally positive sentiments associated with smart city initiatives, it is crucial to recognize and address the potential pitfalls of 'smart city mirages,' where branding campaigns may divert attention from existing urban challenges and their associated socio-economic consequences.Ensuring meaningful citizen involvement is vital for informed urban planning and effective management practices.Additionally, the incorporation of sentiment analysis in areas such as healthcare crisis management during events like the COVID-19 pandemic underscores its practical utility in addressing real-world challenges.Effective use of sustainable development indices requires synchronized data collection, interdepartmental cooperation, and a specific political focus on sustainable urban development involving collaboration between local governments and government bodies in areas like ecology and urban planning.
In conclusion, the integration of sentiment analysis techniques into sustainable development assessments in smart cities represents a dynamic and promising field of research and application.It offers a multifaceted approach to understanding and improving the urban environment, while also highlighting the importance of public perception, citizen engagement, and the ethical implications of technology-driven urban development.As smart cities continue to evolve, leveraging sentiment analysis will remain a critical component of achieving truly sustainable and citizen-centric urban futures.

Figure 1 .
Figure 1.Steps to use the SULPiTER tool.

Figure 2 .
Figure 2. Stages of calculation of the logistics sustainability index LSI.

Figure 3 .
Figure 3. Model of the LSI base table adapted for assessing the sustainability of the urban environment.

Figure 5 .
Figure 5. Statistics of processed messages from the data collection macro service dashboard as of October 2023.

Figure 6 .
Figure 6.Number of messages after sentiment analysis.

Figure 8 .
Figure 8. Analysis of indicators using VADER.

Table 3 .
Analysis of indicators: obtained values.