ChatGPT4 writes Stan code so I don’t have to.

Several months ago I (Phil Price) wrote a Stan model to do some time series forecasting. It took me almost a full day to get it running and debugged. Today I decided to test ChatGPT4, by seeing if it could write a Stan model to give me the same functionality. I have been using ChatGPT to help me with coding, but in the past I’ve used it for little stuff, such as “I have a data frame with column names that all start with mm, and some of the names have ( and ) in them; write a function to remove all of that”. I have just started using it to write entire programs for me, and it looks like it is going to be enormously useful. Below, I present a verbatim transcript of the start of a ChatGPT4 session. At the end I got tired of cutting and pasting — and you would have gotten tired of reading it — but I tell you what happened. And at the very end, I’ll summarize my thoughts on the ChatGPT4 programming experience thus far.


Phil: I have monthly time series data. I want to fit a model to the data (typically from the past five to eight years) and use the model to forecast future values. Let’s call the data y[t]. Here are some things I want to include:
1. Seasonality. If the value in March is high in one year it is probably also high in the following year, and so on for other months. So I want ‘month effects’.
2. Autocorrelation in month effects. The month effect, for a given month of the year, is not iid random. If the March effect is high for several years and then drops a lot, it is more likely to stay low than to jump back up.
3. Local trend. If the value has increased for the past several months it will probably keep increasing, although it might not.
4. Soft constraints. The value of y[t] is very unlikely to fall below a value y_low, or to go above y_high. I don’t want to impose the constraints after the fact. Instead, I want to repel the forecasts from these limits. I want a fairly short-range repulsion, so the forecasts aren’t affected unless they get close to the limits.

Please write a Stan program to do this, and include comments that show where each of those four requirements is implemented.

ChatGPT4: Here’s a Stan program that incorporates seasonality, autocorrelation in month effects, local trend, and soft constraints:

data {
  int<lower=1> T;          // number of time steps
  int<lower=1> T_future;    // number of future time steps for forecasting
  int<lower=1> M;           // number of months in a year
  real y[T];                // observed time series data
  real y_low;               // lower soft constraint
  real y_high;              // upper soft constraint
}

parameters {
  real mu;                  // overall mean
  vector[M] month_effect;   // month effect for each month
  real<lower=0> sigma_month;// standard deviation of month effect changes
  real<lower=0> sigma_trend;// standard deviation of trend changes
  real<lower=0> sigma_y;    // observation noise standard deviation
  real phi;                 // autoregressive trend parameter
  real trend[T];            // local trend
}

transformed parameters {
  real y_hat[T];            // fitted values
  
  for (t in 1:T) {
    int month = (t - 1) % M + 1;
    y_hat[t] = mu + month_effect[month] + trend[t];
  }
}

model {
  // 1. Seasonality: Month effects
  month_effect ~ normal(0, 1);

  // 2. Autocorrelation in month effects
  for (m in 2:M) {
    month_effect[m] ~ normal(month_effect[m-1], sigma_month);
  }

  // 3. Local trend
  trend[1] ~ normal(0, sigma_trend);
  for (t in 2:T) {
    trend[t] ~ normal(phi * trend[t-1], sigma_trend);
  }

  // 4. Soft constraints
  for (t in 1:T) {
    target += normal_lpdf(y[t] | y_hat[t], sigma_y) +
              normal_lccdf(y_low | y_hat[t], sigma_y) +
              normal_lccdf(y_high | y_hat[t], sigma_y);
  }
}

generated quantities {
  real y_future[T_future];  // forecasts of future values
  
  for (t in 1:T_future) {
    int month = (T + t - 1) % M + 1;
    real trend_future = phi * trend[T];
    y_future[t] = normal_rng(mu + month_effect[month] + trend_future, sigma_y);
  }
}

Let me explain where each of the four requirements is implemented:

  1. Seasonality: The month_effect vector stores the effect for each month (lines 15 and 29). The fitted value y_hat[t] is the sum of the overall mean mu, the month effect for the corresponding month, and the local trend at time t (line 34).
  2. Autocorrelation in month effects: The autocorrelation in month effects is modeled by specifying that each month effect (except the first one) is drawn from a normal distribution with the mean equal to the previous month effect (line 39).
  3. Local trend: The local trend is modeled by the trend vector. The trend at time t is drawn from a normal distribution with the mean equal to the phi * trend[t-1] (line 43). The parameter phi controls the autoregressive trend.
  4. Soft constraints: Soft constraints are incorporated in the model block by modifying the log probability density function (line 48). The normal_l

Phil: Your answer got truncated. Please explain how the soft constraints work.

ChatGPT4: I apologize for the truncation. I will explain how the soft constraints work in the Stan program:

In the model block, soft constraints are incorporated by modifying the log probability density function (line 48). Instead of using the usual likelihood y[t] ~ normal(y_hat[t], sigma_y), we add two additional terms:

  • normal_lccdf(y_low | y_hat[t], sigma_y): This term calculates the log of the cumulative distribution function (CDF) of a normal distribution with mean y_hat[t] and standard deviation sigma_y at the lower constraint y_low. The CDF gives the probability of a value being less than or equal to y_low. Since we want to penalize the forecasts close to or below the lower limit, we use the log complementary CDF (lccdf), which gives the log probability of a value being greater than y_low. This term increases when y_hat[t] is closer to y_low, effectively repelling the forecasts from the lower limit.
  • normal_lccdf(y_high | y_hat[t], sigma_y): This term works similarly to the previous one but for the upper constraint y_high. It calculates the log probability of a value being greater than y_high. This term increases when y_hat[t] is closer to y_high, repelling the forecasts from the upper limit.

By adding these two terms to the likelihood, the model will adjust its parameter estimates to avoid forecasting values too close to the soft constraint limits. The strength of the repulsion depends on the standard deviation sigma_y; smaller values of sigma_y will result in stronger repulsion.

Phil: Will that model work for just about any input data, or should I standardize y[t]?

ChatGPT4: The model provided can work for a wide range of input data, but it is generally a good practice to standardize your input data, especially when using probabilistic models like those in Stan. Standardizing your data can lead to better convergence and more efficient sampling.

To standardize your input data y[t], you can calculate the mean and standard deviation of the input data, then subtract the mean and divide by the standard deviation for each data point. In the Stan program, you can add the following to the transformed data block:

transformed data {
  real y_mean = mean(y);
  real y_sd = sd(y);
  real y_standardized[T];

  for (t in 1:T) {
    y_standardized[t] = (y[t] - y_mean) / y_sd;
  }

  real y_low_standardized = (y_low - y_mean) / y_sd;
  real y_high_standardized = (y_high - y_mean) / y_sd;
}

Then, replace y with y_standardized and y_low and y_high with their standardized versions in the model block. After fitting the model, you can convert the forecasts back to the original scale in the generated quantities block:

generated quantities {
  real y_future[T_future];  // forecasts of future values

  for (t in 1:T_future) {
    int month = (T + t - 1) % M + 1;
    real trend_future = phi * trend[T];
    real y_future_standardized = normal_rng(mu + month_effect[month] + trend_future, sigma_y);
    y_future[t] = y_future_standardized * y_sd + y_mean;
  }
}

By standardizing your input data, you make the model more robust and ensure better convergence and efficiency during the sampling process.

Phil: I prefer to standardize my data outside the Stan model. Please provide Python code to standardize the data and to call the Stan model.

ChatGPT4: Certainly! Here is a Python script that standardizes your data outside the Stan model, calls the Stan model, and rescales the forecasts back to the original scale. This script uses the CmdStanPy library to interact with Stan, but you can use other libraries like PyStan if you prefer.

First, make sure you have the CmdStanPy package installed: pip install cmdstanpy

Now, let’s proceed with the Python script:

import numpy as np
import pandas as pd
from cmdstanpy import CmdStanModel

# Load your data (replace this with the appropriate code to load your time series)
y = np.array([...])  # Replace with your actual data

# Standardize the data
y_mean = np.mean(y)
y_sd = np.std(y)
y_standardized = (y - y_mean) / y_sd

# Set the other required data inputs
T = len(y)
T_future = 12  # Number of future time steps for forecasting
M = 12  # Number of months in a year
y_low = 10  # Lower soft constraint
y_high = 50  # Upper soft constraint

# Standardize y_low and y_high
y_low_standardized = (y_low - y_mean) / y_sd
y_high_standardized = (y_high - y_mean) / y_sd

# Set up data dictionary
data_dict = {
    'T': T,
    'T_future': T_future,
    'M': M,
    'y': y_standardized,
    'y_low': y_low_standardized,
    'y_high': y_high_standardized
}

# Compile the Stan model (use the provided Stan code in a file named "time_series_model.stan")
stan_model = CmdStanModel(stan_file="time_series_model.stan")

# Fit the Stan model
fit = stan_model.sample(data=data_dict)

# Extract the forecasts
y_future_standardized = fit.stan_variable("y_future")

# Rescale the forecasts back to the original scale
y_future = y_future_standardized * y_sd + y_mean

# Calculate the mean and percentiles of the forecasts
y_future_mean = np.mean(y_future, axis=0)
y_future_lower = np.percentile(y_future, 2.5, axis=0)
y_future_upper = np.percentile(y_future, 97.5, axis=0)

# Print the results
print("Mean forecasts:", y_future_mean)
print("Lower bound (2.5 percentile):", y_future_lower)
print("Upper bound (97.5 percentile):", y_future_upper)

Phil: [After running the model and generating forecasts] I thought the constraints would keep all or almost all of the forecasts to between y_low and y_high but that’s not the case at all, they go much higher and much lower. They go much lower than 0.5 times the minimum value in the data, and much higher than 2 times the max. Am I doing something wrong with the definition of y_low and y_high, or is there a mistake in the code?

ChatGPT4: I apologize for the confusion. The issue is due to the fact that y_low and y_high are defined based on the original data (y_kwh) but not adjusted to the standardized scale. When you use y_low and y_high in the Stan model, they should be defined relative to the standardized data.

To fix the issue, you need to standardize y_low and y_high by applying the same transformation as you did for y. [ChatGPT4 provided the code, not shown here]

Phil: That’s pretty much what I’m doing, but I still have the problem. Here’s the actual code I’m using: [code not shown, it’s the same as above except instead of y_low=10 and y_low=50 I set these to be 0.5 times the minimum value in the data, and 2 times the maximum.]

ChatGPT4: I apologize for the confusion earlier. I reviewed the Stan code again and realized that the soft constraints are not strong enough to keep the forecasts within the desired limits. The cauchy_lpdf in the Stan model does not repel the forecasts as strongly as you intended. To address this issue, we can use a different distribution for the soft constraints, such as a truncated normal distribution.

Please update the Stan code for the soft constraints as follows:

Replace the lines:

for (t in 1:(T + T_future))
    target += cauchy_lpdf(y_hat[t] | y_low, 0.01) + cauchy_lpdf(y_hat[t] | y_high, 0.01);

with:

real y_low_sd = 0.01;
  real y_high_sd = 0.01;
  for (t in 1:(T + T_future)) {
    target += normal_lcdf(y_low | y_hat[t], y_low_sd) + normal_lcdf(y_high | y_hat[t], y_high_sd);
  }

This change replaces the Cauchy distribution with a truncated normal distribution for the soft constraints. The normal_lcdf function is the cumulative distribution function (CDF) of the normal distribution, which can be used to create a stronger repulsion effect.

After making this change, the forecasts should be more constrained within the desired limits defined by y_low and y_high. Please update your Stan model with this new code and re-run the analysis. If you still experience issues, please let me know, and we can explore other ways to strengthen the soft constraints.

Phil: Unfortunately I am still getting an error: RuntimeError: Error during sampling: Exception: array[uni, ...] index: accessing element out of range. index 62 out of range; expecting index to be between 1 and 61 (in 'time_series_model.stan', line 48, column 4 to column 98)

ChatGPT4: I apologize for the oversight. The error occurs because the transformed parameter y_hat is defined only for the observed time series (of length T), but in the soft constraints loop, we are accessing elements up to T + T_future. To fix this issue, we should extend the y_hat array to include the future time steps as well. Here’s the corrected code:

transformed parameters {
  real y_hat[T + T_future];  // fitted values including future time steps
  
  for (t in 1:T) {
    int month = (t - 1) % M + 1;
    y_hat[t] = mu + month_effect[month] + trend[t];
  }
}

With this modification, y_hat is now an array with T + T_future elements, allowing the soft constraints loop to access future time steps without going out of range. Please update your Stan code with this change and re-run the analysis. If you still encounter any issues, please let me know, and we can explore further adjustments.

=====
At this point I will stop with the verbatim dialog. There were several problems with the code: loops that ran over the wrong range; variables calculated in the transformed_parameters block that should have been in the generated_quantities block; also an error I introduced while cutting and pasting code from chatGPT4; a rather fundamental problem using the truncated normal approach to the penalties (I ended up switching to a sort of exponential repulsion). It took about one additional hour to get the code to run and to confirm to my satisfaction that it’s doing what I want. All told, this took about 2.5 hours. And all I did was end up with a model very similar to one I wrote by hand several months ago. But that one took me something like six hours! So 2.5 hours is a huge speedup. And I am still learning how to get chatGPT to write code for me. As I get better I expect to get faster.

Working with ChatGPT4 for this task was like working with a good intern programmer who is rather forgetful and a bit scattered. At one point it gave me a line like target -= normal_lcdf(…). I said “target -= looks weird, is that really acceptable syntax?” and it said Yes! (It isn’t). Also, it acts as though it can’t hold more than one idea in its head at a time sometimes. Considering it actually holds _zero_ ideas in its ‘head’ at any time, it’s very impressive that it can do this at all. Still, it’s sometimes frustrating to seemingly mutually decide to modify a chunk of code in a specific way, and two minutes later it reverts to the old way. That is quite a common occurrence.

ChatGPT4 can be frustrating in its forgetfulness, its occasional syntax errors, etc. And on both this programming project and another one in which I’ve put ChatGPT4 to serious use rather than idle testing, I have had to be very attentive because it sometimes produces code that runs but does not do what I asked: the model is not the one I specified, or a pre-processing step is wrong, or whatever. I have not yet been able to simply tell it what I want and get the answer out of it directly.

I thought I had learned some lessons on my other project and that with this one here I would indeed be able to get good code right away. For instance, it seems to help if you say “please implement features X, Y, and Z, and identify where in the code these are implemented”, so I tried that above…and it still wasn’t enough.

But: This has already changed the way I code. I am not sure I am ever going to write code completely by hand again! And I expect these tools to get better fast.


74 thoughts on “ChatGPT4 writes Stan code so I don’t have to.

  1. Interesting. I hadn’t tried coding with it yet, but maybe I will try it out on a few demos I have been meaning to write.

    >And all I did was end up with a model very similar to one I wrote by hand several months ago. But that one took me something like six hours!
    But presumably it would have taken less than six hours to write by hand a second time?

  2. You say 2.5 hours writing ChatGPT code is a speedup over 6 hours doing it on your own, but during the ChatGPT session, you already had experience with that sort of problem from the 6 hour exercise. How long would it have take you starting without that advantage?

    • Yeah, that’s the same point Jessica made, above. And it’s true: writing and debugging my own model definitely helped when I was trying to get ChatGPT to write one for me. Perhaps it would have taken me an additional half hour, or hour…or, just possibly, two hours…to get to a satisfactory final model if I didn’t already have some experience. But that would still have been faster than doing it all by myself!

      I suggest giving it a try…although it’s also not crazy to wait a month or two and see what happens. These tools are never going to be worse than they are right now.

      • “Comparison to historical data aren’t enough ; you need a prospective study.”

        Well.. this sounds awfully close to a classic argument supporting RCT over matching historical control data to new experimental data…

  3. There are several pre-prints published in arXiv on these matters almost daily. I haven’t followed these that closely, but it seems that this AI generates a lot of buggy code. There are also some scarier things involved like potentially introducing security vulnerabilities into a code base. The lower the level of code, the more it seems to get wrong.

    What is perhaps more interesting is the training data involved. I suppose much of it comes from sites like Stackoverflow, and in case of Stan, the project’s own documentation. As for the example provided, I see little reason to rely on AI for this kind of stuff (unless you are learning new things in which case it can indeed be valuable).

    I am also skeptical about its ability to delve into bigger issues such as software architectures. But who knows? Perhaps bigger improvements are likely in the domain of automated code/bug correction.

    • Indeed, it does generate lots of buggy code. But so do I! For me, and I think for a lot of people, “programming” is mostly debugging mixed with some refactoring, just as writing is mostly rewriting.

      I’m finding ChatGPT4 to be frustrating, because it can be stupid and forgetful and can make mistakes both obvious and subtle.

      But I’m also finding ChatGPT4 to be extremely useful. It makes fewer of them than I do. And it’s very fast compared to me; fast enough that I can try several different models or transformations in the time that, if I were doing the coding myself, I could only do one.

      The underlying ‘large language model’ has a zillion parameters, so in a sense there is a lot of flexibility in ‘tuning the model’ but, from what I understand, it is actually quite hard to work on a specific issue. But even I have some ideas on how to improve its ability to code. One obvious thing is to have a separate module that checks the code’s syntax, and, if it finds an error, automatically feed the code back in, along with the assertion that there is an error in such-and-such a place. It’s kind of funny, if you put in code and tell it there’s an error (and where it is) it can almost always fix it; but this does not stop it from generating code that has those errors in the first place. It’s oddly human-like behavior, or perhaps oddly Phil-like behavior: I’ll write some code, and the editor will highlight an error for me; or else I’ll write some code and it will fail with an error. Once I know the error is there, I can see it and fix it. ChatGPT4 is the same way, at least some of the time.

      Anyway I strongly suggest giving it a try. You may as well: I think in a year, maybe less, these sorts of tools will be so helpful that you’d be a fool not to use them. May as well start getting used to them.

      • I agree with these points. Though, I restate that a big problem will be if novice programmers will blindly resort to these, which will presumably worsen the already pretty catastrophic situation with software security.

        But given the focus of this blog on political science, I’d like to hear more focused opinions regarding the hypothetical effects of these generative AI things upon democracy and politics. To me, it seems certain that the today’s information chaos will severely deepen, especially when we soon start to have generative models for video content and the like. Writing from Europe, there are already several cases of image manipulations circulating on matters related to Ukraine.

        • So here is a recent position paper from people involved also in OpenAI:

          https://arxiv.org/pdf/2301.04246.pdf

          I think it raises quite a few important points at a general level. However, their discussion regarding democracy and politics is still quite superficial, mainly revolving on disinformation and the like.

  4. Maybe I can share my recent experiences with ChatGPT coding requests. Two weeks ago, I was thinking in circles and not really mentally there while sitting in front of my desktop, and I had the ‘brilliant’ thought I could do what I learned from this blog and ask ChatGPT to write the function I needed to proceed. It used code I didn’t understand at first, but it worked.
    Having that good experience on my mind, I asked the bot yesterday to write a function to map ICD10 codes to their respective chapters. (ICD10 codes are the code for the diagnosis your doctor writes down. They are sorted in chapters with odd boundaries, like chapter two reaching from code C00 to D49. I’m sure there are historical reasons, but it makes for an annoying exercise to find the chapter to a diagnosis code.) I tried to coax the bot into writing this rather simple function, which is not difficult but annoying to write. I couldn’t get it to do it. It appears as if its information on how ICD10 chapters work was in a compartment that cannot be opened at the same time as the ‘coding compartment.’ A shame really! I’ll keep trying to code with ChatGPT though, I see a lot of potential there. But I appreciate the risks pointed out in the commentaries to this blog entry and will take them to heart!

    • ChatGPT helped me figure out what self-attention vs. attention were and then some detail about batch normalization vs. something else normalization. I don’t think these things are supposed to be hard to know, I’ve just had trouble figuring them out. So yeah, these were real breakthrough moments for me cuz I could stop being annoyed with the uncertainty and just write the code I wanted to write.

      It’s given me fake or wrong results. Now that I’ve seen it happen though, it seems fine. Like, half the stuff I Google is wrong or misleading in some way, so who cares — functionally the same deal for what I’m doing.

      • Ben:

        I wouldn’t say the chatbot is “functionally the same” as Google search. Rather, I’d say “better in some ways and worse in others.” Chatbot is better in that it can give customized code, which you can’t get from Google. Google is better in that it gives you the source of its information, which you can’t get from chatbot. I’m sure the boffins are working on tools that combine these two features.

        • Oh yeah, agreed. I meant to say that the made up stuff is no worse than what I’d get searching, not that the overall thing is the same as search (indeed, it helped me figure out those nagging terminology problems I had).

    • I should mention that chatGPT4 is much better (at a lot of things, including coding) than chatGPT3 and chatGPT3.5. I assume whatever comes next will be another substantial step up. Only drawback is you have to pay, but at least for me it is easily worth it.

  5. So many people seem intent on “showing” (really claiming) that ChatGPT is bad, just hype, or not effective. Why? All I see from these experiences and my own is that ChatGPT has enormous potential when used by someone that is capable. If they are not, it becomes dangerous. How is that different than any other technology? The same could be said about GUI interfaces for statistical analysis. The cause for optimism would be that, given what ChatGPT can do so easily, humans should strive for, and be held accountable for, doing better than ChatGPT.

    Unfortunately, there are masses of people who may use it blindly or in place of human judgement. That indeed is dangerous, as is the use by knowledgeable people who will use it for ill purposes. I am not optimistic – both of these dangers are real, and humans have proven themselves incapable of devising ways to protect against misuse (of virtually any technology). So, the comments designed to show the flaws of ChatGPT seem off-point to me. They are speaking to the choir. As Phil’s post demonstrates, on its own ChatGPT will not replace what he does. But it will help him – precisely because he has the skills and judgement to use it effectively. If I use it to code, that would really be dangerous – but who is going to stop me? That is where the real danger lies.

    • Fair enough. I agree with the point that blind use of tools, whether artificial intelligence or something else, is dangerous.

      But let’s take a step forward in this line of argumentation: if blind use is bad or dangerous, how dangerous will malevolent use be? Think about finding exploitable vulnerabilities, help at writing malware, spam, social engineering, phishing, and everything in-between.

      As for the hype, the monetization is well-underway by the so-called Big Tech; they seem to capitalize simultaneously both on the alleged or real fears and the great promises. Yet, we should remind ourselves that these things are being developed throughout the world.

      • I agree with you. I am quite worried about both blind use and malevolent use and not optimistic we can control either. The OpenAI document that you linked to above spells out quite nicely both the dangers and hints at what is required. But as soon as I see the words “collective action” I cringe – what examples do we have of using collective action to control the use of technology? Guns? Nuclear weapons? Fossil fuel use? I’m not saying these things are bad – what I believe is that the dangers are real and we have exhibited little progress towards the collective action required to use them responsibly.

        • So here is a today’s evaluation of ChatGPT’s ability to prompt secure code:

          https://arxiv.org/pdf/2304.09655.pdf

          The conclusion is:

          “In this experiment, we asked ChatGPT to generate 21 small programs, and found that the results often fell way below even minimal standards of secure coding. Nonetheless, we did find that the interaction between with ChatGPT on security topics to be thoughtful and educating and after some effort, we were able to coax ChatGPT into producing secure code in for most of our use cases. In this context, while we believe that chatbot are not yet ready to replace skilled and security aware programmers, they may have a role to play as a pedagogical tool to teach students about proper programming practices.”

          I think this pretty much agrees with what we discussed yesterday.

        • There is interesting dynamic going on everywhere.

          I think the general public may eventually lose one of its favorite “quite fact-based source”, Wikpedia. So ChatGPT can already prompt everything needed to “write” a ready-made Wikipedia page on any topic imaginable:

          https://www.reddit.com/r/ChatGPT/comments/12udskc/you_can_ask_chatgpt_to_write_a_wikipedia_article/

          Then Wikipedia has a policy on LLMs, which are generally discouraged:

          https://en.wikipedia.org/wiki/Wikipedia:Large_language_models

          They also require attribution, which is again violated by the companies doing LLMs:

          https://en.wikipedia.org/wiki/Wikipedia:Copying_within_Wikipedia

          And then there seems to already be huge problems with curating LLM-generated content and their inaccuracies:

          https://en.wikipedia.org/wiki/Wikipedia_talk:Large_language_models

          Now what is interesting in this dynamic is that LLMs train heavily on Wikipedia to gain at least some “facts”. Then people subsequently use LLMs to create “content” on Wikipedia. Eventually this self-loop may compromise all information quality. Like elsewhere, Wikipedia may then eventually lose its human contributors as there is little motivation for trying to compete with machine-generated spam.

        • Jukka:

          Indeed, had Ed Wegman had access to a chatbot, there seems little doubt he would’ve used it to write his articles. It does seem as though we are now swimming through thick seas of spam and may soon drown in it.

      • Still here because I am somehow addicted on understanding what people all over the Internet are saying about LLMs. Here is a great thread on how people are fiercely opposed to plans of rolling a LLM to the popular Stack Overflow Q&A site:

        https://meta.stackexchange.com/questions/388401/new-blog-post-from-our-ceo-prashanth-community-is-the-future-of-ai

        The conclusion seems to be that the inaccuracies are so blatant and so severe that such a deployment would likely ruin the whole site and the company with it. Here is a another thread with great comments:

        https://news.ycombinator.com/item?id=35649496

        As pointed by someone there, the real danger with these things is the release of several different artificial lunatics to the Internet, some more delirious than others.

        Thus, the safety concerns and the need for regulatory action are very much real. But these are not because of AGI but because these things are so immature that they will pose a great risk to various industries, public sector organizations, and individuals. Harms will certainly occur due to unwarranted deployment.

        Therefore, some companies are already trying to capitalize also on these safety issues:

        https://www.giskard.ai/knowledge/the-safest-way-to-use-chatgpt-and-other-llms

        The business model of this company is the usual one: let a ChatGPT or something similar run wild and then let users do the work of correcting the inaccuracies and biases, which, in turn, will improve their product.

        A lot of platforms have recently also imposed bans on their APIs or placed restrictions on scraping. What will happen to search if something like newspapers will forbid search engines from crawling because robots.txt is not apparently anymore honored?

        Now, it will be interesting to see what happens to the whole Internet due to the apparent deployment that is already occurring. My prediction is that large portions of the “commercial Internet” will turn into incomprehensible and inaccurate machine-generated realms within which different parts continuously replicate each other. Once enough inaccurate AI-generated material is out there, it will also become increasingly difficult (if not impossible) to build more accurate models.

        If this prediction is true, some positive effects are also possible. Maybe people will turn more and more to traditional media, science, think tanks, and maybe even old-fashioned (non-online) books. Here in the Nordic countries there are also some quite unexpected turns; some people are increasingly promoting things like real-identity verification and even social media orchestrated by public sector organizations to ensure that commercial exploitation will not occur.

        Another thing to watch out is how the open source community will react to the illegal harvesting of their code for the commercial purposes of CoPilot. It may be that a lot of projects will migrate from GitHub either to other hosting providers or even back to maintaining their own infrastructures. I’d consider this a positive trend. Thus far, the scientific computing community (e.g., the R project etc.) seems to have embraced ChatGPT and the like, but awakening may be coming there too, given that much of stuff in C-RAN is under the highly restrictive AGPL-3 and citations/attribution matter to people involved in these projects.

        • I can’t resist pointing out that if we could study the people in power in existing organizations we would likely conclude that they “are so immature that they will pose a great risk to various industries, public sector organizations, and individuals.” I don’t mean to say that I think the dangers are underestimated – I believe them to be real and serious. But humans are also a real danger and I don’t see much evidence that we’ve learned to control the humans in power.

        • Sure: all this more or less traces to the ruthlessness of Big Tech and associated elites. Here are also some good speculations from the developers of these things:

          https://news.ycombinator.com/item?id=35652452

          They seem to increasingly speculate that the much despised post-truth philosophers, postmodernists, and whatnot were kind of right after all.

      • Especially since its so easy to trick it into doing something it doesn’t want to do by saying “ok, I won’t do that. What steps should I be careful to avoid doing?”

  6. As of a couple weeks ago, I now code with a ChatGPT window open. I can no more imagine removing it than I could have imagined removing StackOverflow last month. Now I use GPT first instead of Google.

    ChatGPT has not been optimized as a coding assistant. GitHub CoPilot is a custom AI-driven API for coding help. But it’s not based on as powerful of a language model. That’s the future. I haven’t tried it yet, but am going to later this week. When using ChatGPT, it helps to coach it through the structure of a problem. Ask it to outline how to formulate a solution step by step, then ask it to do the steps one at a time (and recurse for really hard problems). That’s how I’m getting it to write a mega-dungeon for D&D for me. You have to be descriptive of the kind of writing you want or you get something that looks like it’s aimed at a junior high student. You have to constantly remind it to show and not tell when it’s writing, but when prompted, it can do it. What I can’t prompt it to do so far is to have any nuance in its writing. I’d like to get it to write at the level of a New Yorker article (i.e., better than me), but we’re not quite there yet.

    All of these tools are better in more popular languages and for more popular tasks. GPT4 is much better than GPT3 at things like namespace control and writing efficient code. I’m writing an introduction to Stan and it literally wrote all of my plotnine code (that’s Python’s ggplot clone)—I used to spend hours struggling with StackOverflow and the ggplot docs, and now I just describe the graph I want. I do know the “gg” part, but example-by-example mentoring from ChatGPT would’ve been a better way to learn the grammar of graphics if I didn’t know it. You don’t even need GPT4 for this kind of small-scale coding help.

    My colleagues are using it to do things like refactor 500 line Python programs into PyTorch or into JAX. Not in one go, but a bit at a time. Many of my colleagues are using it to copy-edit their scientific papers. Nobody is using it to replace PubMed or Google Scholar.

    I got GPT4 so I could run it through a battery of psycholinguistic tests. It aces the first level of all of this from deep center embedding to coreference to using world knowledge to disambiguate attachment ambiguties. For someone who worked on NLP for 25 years, this is nothing short of mind blowingly amazing. There’s an entire field that was made obsolete over the last 5 years (the attention paper, which introduced the transformer architecture for the large language models everyone is building, was published by a team at Google in 2017). I’m not exaggerating here. With zero-shot learning (just tell it what you want), ChatGPT can do pretty much anything a custom-built NLP system can do other than for very specialized tasks (which can now all be made much better by switching to a transformer model from however they were built in the 2020s).

    • What do you mean by “psycholinguistic tests”? While I am not a psychologist, it seems to me that people are once again rerunning all the questionable things with ChatGPT. So you have claims that it is politically biased:

      https://arxiv.org/pdf/2304.07333.pdf

      These claims are based on things like the “Big Five personality” and the “Dark Factor test”. To my understanding (which may well be wrong), all these are questionable. But funnily enough, these things are taken seriously and hence billionaires are funding their own generative models for pushing “conservative” or “liberal” agenda. Hence, the argument about disinformation again follows.

      • Hey Jukka,

        as the author of the paper you’re referring to: the “personality assessments” that we made ChatGPT take are used to see how it perceives itself. Of course we know (and state) that they are in part controversial (see MBTI) and that other tests might be more appropriate (do you happen to have recommendations?). However, the political biases that we and other researchers have observed are a whole other part of the contribution. The tests used for this are also questionable if you will, but I would argue that so is every test that quantifies something that is subjective in nature?
        In any case, thank you so much for your feedback!

        • As said, I am not an expert in this domain. But surely enough, even a quick literature search would indicate a lot of criticism about things like the “Big Five personality”.

          As for political biases, I do not disagree with your observations as such. There was a good media take in Washington Post today about the training sources they’ve used. A lot of questionable material was involved, including even sites that can be labeled extremists. But as Altman and associates have claimed, the era of large-scale LLMs may be over due to various of reasons.

          As for politics in general, I somewhat fear that we’ve going to see custom LLMs trained by various political actors and their funders. As you seem to be from Germany, consider a custom LLM designed to push Alternative für Deutschland’s political goals, and so forth and so on.

        • A little bit more:

          the situation in Germany seems already quite “innovative”, given that some media outlets therein have already spread AI-based disinformation about the F1 racer Schumacher:

          https://hardwaresfera.com/en/noticias/otros/entrevista-michael-schumacher/

          With knowledge about the proper locations of data for training, I think it wouldn’t be that difficult to build a LLM promoting, say, the Reichsbürger agenda.

          Of course, every country nowadays has their own disinformation themes, although many of the themes seem to spread across borders.

          My point being that the issues span well-beyond the LLMs developed by the big players.

      • Ah, I see the confusion. I meant cognitive psychology. There’s a lot of literature growing up in this vein in computer science of all places. For example, Allyson Ettager in 2020 wrote What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models. All of the things BERT (an earlier transformer-based LM than GPT) failed at, GPT4 does with no problem.

        For example, how well can it handle center embeddings? It’s well known that human short-term linguistic memory limits our ability to center embed when listening to speech. Here’s a sequence of 0, 1, 2, and 3 center embeddings:

        • The mouse scurried.
        • The mouse the cat chased scurried.
        • The mouse the cat the dog bit chased scurried.
        • The mouse the cat the dog the person scolded bit chased scurried.

        GPT4 can flawlessly sort out who did what to whom in these cases. Humans have a hard time in spoken language starting with double embedding.

        I looked at garden path sentences, which are related, like

        • The horse raced past the barn fell.

        This plays off the fact that “raced” is overloaded as the past and past participle in English (cf. “to run”). Humans typically want to complete the sentence with “raced” as the main verb, but in fact it’s a subordinate clause and “fell” is the main verb.

        I also looked at concept completion and negation as in the above linked paper, and I would say its performance is better than any human’s now at this task, but then this is the bread and butter of an LLM.

        If I use a prepositional phrase with “pied piping“, can it track the quantifier scope? For example, I asked it how many people broke legs if I say “this is a table every leg of which someone broke” and it understands the ambiguity.

        I haven’t tried giving it standard educational reading comprehension tests, but I think I’ll try that next. But there are lots of people doing this in a much more organized way.

  7. Wake me up when you don’t have to reveal massive amounts of sensitive information to a third party to make it work (ie. When it runs in your own computer).

    Im surprised there haven’t been massive firings for copy pasting trade secrets etc into chat gpt yet.

    • I think there have already been plenty of these incidents, but I lack time to find the media sources. If I recall correctly, also lawsuits and litigation are occurring in many parts of the world.

      These cases also underline the business logic: the companies now leading AI work will only outpace their rivals as more and more people and smaller companies are supplying their datasets to the AI companies, which will then incorporate these into their training data.

      As for local use, there are already open source projects underway at this front: e.g.,

      https://www.together.xyz/blog/redpajama

      This is the direction I would personally also welcome. I would like to also see custom-tailored LLMs based purely on scientific work and scholarly records. Alas, open access publishing has not yet progressed to the extent that would make this possible.

      • Suppose you’re using this to price electricity and OpenAI is selling that information to electricity market counterparties, including the text of your source code?

        • Or if that’s not relevant to this particular instance, consider just a bazillion instances where it is relevant. Can Morgan Stanley use this for their trading algorithms? Can industrial controller manufacturers use it for sewage plant control systems? Can it be used safely for developing an accounting system for medical expenses? Student financial records? Etc etc? The answer is unabashedly NO. The infosec issues are HUGE and it bothers me that this isn’t a widely held and obvious truth to most people. It’s all over the infosec community though. Lots of private info is leaking this way.

        • If we go all dystopia, there is already a huge lobbying push to embrace ChatGPT even in elementary schooling. All this is based on the marketing claims that kids should know this new “AI” so that they “won’t be left behind”. If the future labor markets will be about spamming machine-generated content online, I can understand the calls. What is more alarming is the idea of “personalized learning”, i.e. kids will be fed content from unknown entities based on the same logic that drives social media. With the attention deficiencies caused by TikTok and the like, increasing lack of critical thinking skills and media illiteracy, etc., this will surely come down well.

          And all these things are already happening. For instances, judges in some parts of the world have already consulted ChatGPT apparently without any understanding of its flaws:

          https://techpolicy.press/judges-and-magistrates-in-peru-and-mexico-have-chatgpt-fever/

          As for science, I agree that we’ll probably see a flurry of fake paper submissions, including those with fake data generated by “AI”. Given that some publishers like IEEE have already had big problems with fake confetences, ChatGPT will make it easy to orchestrate these scams more easily and in an automated manner. Likewise, the Internet will likely see an explosion of predatory journals, although the word “predatory” seems outdated because the intention will likely be to deceive the public even more.

        • Don’t forget that we have evidence that judges rule differently before and after lunch and police give fewer tickets to motorists that have the same first name and…..

          Again, the point is not to minimize the dangers of AI but to realize that the human standard has become a lower and lower bar. I can easily list reasons why AI is more dangerous than these examples of faulty human behavior, but that somehow feels like a fairly weak argument against AI.

        • To clarify: I am far from being a so-called Luddite. I just see that things are going to the wrong direction and quite rapidly.

          For starters, in Europe we tend to call this “AI” as automated decision-making systems (ADMs).

          Take public sector administrations as an example: here there are great benefits for streamlining if done properly. While law enforcement likely continues to be a highly controversial case, some advances supported by the majority of citizens have already occurred; these include things like ADMs designed to catch tax fraud, money laundering, and the like. On the other hand, there have been several lawsuits in Europe regarding questionable and discriminatory ADMs for things like social security benefits. In many of these cases the public administrations have lost their cases and even mandated to compensate to the plaintiffs.

          As for ADMs in jurisprudence in general, I have read quite a lot of literature on the topic. In some smaller civil cases an “AI judge” may indeed be better than a human judge, and so forth.

          So, now, my criticism generally centers on the outsourcing of public sector ADMs to private sector companies and, in the worst case scenario, to highly unreliable things like the ChatGPT. Both the U.S. and the EU and its member states have huge databases of legislation and case law. If they want to make ADMs upon these things, why not open source everything so that everyone could scrutinize the ADMs involved?

          The same goes for education. There are huge databases held by schools and universities. Hence: if you must do AI for learning, why not learn upon these instead of outsourcing things to some questionable billionaires?

          Of course, these are mostly high hopes: things like Microsoft Learn and Office 365 leak everything to the company. So who knows what the company does with the stuff; probably it will do LLMs based on the data it has acquired, and as kids’ privacy has been breached, you get the “personalized learning” as a bonus. The same goes for Google Education and whatnot.

          If you are fine as an educator for these companies doing their questionable things, then I won’t object. But please be aware that they are profiling kids for later monetization in their adulthood.

        • DL wrote:

          “In fact, this is certainly how the brain works”

          That’s the old story. The latest edition of the standard neuroscience text (Kandel et. al. sixth edition (you’ll need a dedicated, and really sturdy, bible stand for it if you buy it…)) points out that areas that were thought to “do X” turn out to do lots more than X, and that the idea that “Y is localized” is giving way to the idea that large numbers of areas work together in a systematic way to do Y. They still like the localization idea, it’s just that it’s not quite that simple.

          Speaking of references, I tried to persuade Bing to tell me who the first bloke who misquoted Chomsky (as “Colorless green _dreams_ sleep furiously” (it’s ideas, not dreams; sheesh)) was. It couldn’t figure it out and got insulted that I kept asking the same question. (I was going to excoriate some math types who used the misquote in an undergrad text, when I thought to check with St. Google and found there are lots of places that use the incorrect form, so I need to tone down my criticism a bit. The sin of being wrong in a horrifically ugly way is only slightly ameliorated by one’s not being the first to make the particular mistake at hand.)

      • Well, the output code is supposed to be the intellectual property of the employer, but OpenAI now has it. Once you use code written by ChatGPT, OpenAI has access to your source…

        • Well raised points by Lakeland. Again I am not currently in a position to verify my sources, but I think also the open source community is litigating Microsoft for their alleged breaches of the open source licenses. And regarding Europe, apparent breaches have also occured regarding non-commercial personal data.

        • I add to my own comment that it will be interesting to observe whether and how well the jurisprudence will be able to address the issues or even maintain the existing legal order (no matter of a region, but generally speaking about the Western world). As the litigation cases will quite presumably pile up, I hope the lawyers and judges are up to the task (unless lawmakers intervene early). There is a huge list of issues involved: everything from defamation, trade secrets and the like to copyright, and all the way back to national constitutions.

        • Still updating this thread: there are some interesting things Google and associates have claimed about their right to store and process copyrighted content. For instance, in this paper:

          https://arxiv.org/pdf/2301.13188.pdf

          they claim that a Creative Commons license is all free for grabs.

          But regarding scientific publishing, for instance, many papers published under OA, including those in arXiv, use different copyrights, such as CC BY-NC-SA (i.e., only non-commercial use with attribution) and CC BY-NC-ND (in addition, no derivatives). We’ll see, but I think also big science publishers such as Springer and Elsevier might be up to litigation soon. Given the examples given, I think the Googles of the world may also have only weak defenses at their disposal.

        • Jukka, copyright doesn’t and shouldn’t protect people from using ideas they read about to create new works. As long as they aren’t copying that work or deriving substantially similar works (ie. A play where all the names are changed but the dialog is otherwise the same, or a photograph where everything is the same except the face has been photoshopped in…

          Cory Doctorow has a good article on this
          https://pluralistic.net/2023/02/09/ai-monkeys-paw/

        • I am well-aware of Doctorow’s position; after all, he has been a vocal critic of copyrights for a long time. But I am far from being sure how this will play out in courts, especially when considering that multiple jurisdictions are involved throughout the world.

          There are a lot of technicalities involved, of course. Regarding text, I think the big companies clearly are “copying” things, i.e. already the pre-processing requires the use of a cloud.

          But regarding science, I think the AI companies must fix the issue with proper referencing to the works they clearly are using. This has been the very cornerstone of science for centuries. Serious issues arise if e.g. a LLM based on PubMed is used to diagnose patients without knowing the exact studies involved. As we all too well know from this blog, not all science is exactly reliable to give advices on anything…

        • Replying to Jukka, who wrote:

          “But regarding science, I think the AI companies must fix the issue with proper referencing to the works they clearly are using. This has been the very cornerstone of science for centuries. Serious issues arise if e.g. a LLM based on PubMed is used to diagnose patients without knowing the exact studies involved. As we all too well know from this blog, not all science is exactly reliable to give advices on anything…”

          My understanding of the underlying technology is limited (I’m not willing to do the work to figure out the details because to the best I can tell, it’s essentially a parlor trick, but) but to the best I can tell, the LLM technology precludes producing refrerences; it converts it’s input data to an enormous matrix of probabilities and token strings and reconstructs (generates, whatever) it’s output from a data structure stripped of references.

          So its thing of producing bogus urls isn’t a glitch that can be fixed, it’s an inherent and basic property of this particular parlor trick (oops: “technology”).

        • David in Tokyo is right, but it’s possible to have multiple components to the whole parlor trick. In fact, this is certainly how the brain works (there is a “language processing region” and there is a “visual processing region” and there are various “emotional” centers, and the visual processing region probably has sub-regions designed to analyze faces, and those are connected to the emotional regions in various ways etc)

          So, an LLM which generates references could have its references checked by a “reference checking” component, and then stripped if they are not real etc.

        • Replaying to DL who said

          “So, an LLM which generates references could have its references checked by a “reference checking” component, and then stripped if they are not real etc.”

          Well, yes. But. The output of the LLM is a random sequence of undefined tokens, so the problem of reference checking is equivalent to the problem of understanding natural language (i.e. the RE module has to guess what the user will think the LLM output “means”). Which we still aren’t doing very well at…

          Although in real life, it won’t be that hard. Bing seems to do a bunch of ordinary keyword (indexed) searches on various subsets and recombinations of the user query and uses the results of those as input to the LLM, and that seems to keep it from going off the rails quite as crazily as ChatGPT. And it already has the links before it even calls the LLM, so it doesn’t even need to do that.

      • Daniel,
        If I wanted to keep this particular modeling approach secret, I wouldn’t have put it in this blog post.

        Your concern is a legitimate one — and indeed I think the situation is even worse for GitHub Copilot, which is scaring me a little in the code it suggests: it sometimes includes fragments of other people’s work that are not just standard code, e.g. keywords and such. If someone hard-codes a password, would that ever get suggested to me as an autocomplete? I think it could. And even if it couldn’t, the fact that the code is uploaded to a third party for analysis is very worrisome. I would not use copilot with any file that contains anything whatsoever that I wish to keep secret.

        But using ChatGPT4 in the way I’ve done here is very different. It is not looking at my code — unless I copy and paste a chunk for it to look at. I can use it to write lines, functions, or complete programs, and as long as I’m not telling it anything I wouldn’t post on this blog I see no risk in doing that.

        So: if you’re saying you won’t use it for _everything_, hey, I’m right there with you. But if you’re saying you won’t use it for _anything_, which is what I think you’re saying, then I think you are needlessly depriving yourself of an extremely useful tool.

        • Basically I’m saying I’ll start using it when OpenAI or any other extractive organization isn’t spying on the process. I don’t use Gmail, I don’t own a smart TV, I run my house automation on 100% locally controlled devices with Home assistant, I use Signal not SMS messaging, I don’t have an Echo or a Google Home device, etc etc. Similarly I won’t be letting OpenAI watch me design computer programs.

        • I chart a middle path. When I’m doing regular web searching, I go ahead and use Google, no VPN or anything. But when I’m searching for how to dispose of a body, I take precautions.

        • Phil:

          I just watch old Columbo episodes: they reveal all the mistakes that you could possibly make. The good news is that my bike has no odometer, so they’ll never be able to catch me that way!

        • I use Firefox exclusively, and the temporary containers extension, and the enhanced tracking protection. My search engine is startpage which proxies between me and google’s search results. I mean, I still have an android phone… because you gotta have a phone and there’s no way I’m going to borrow one from Apple (Apple owns your phone, since it decides what can be installed on it… only approved apps from its app stores). But my android phone has location history turned off, the “voice assistant” is turned off, the web search history is turned off…

          I care about self-determination and privacy, and I think more people should. It’s nothing to do with dead bodies. Would you install a keylogger on your laptop that sends your keystrokes to google? Why or why not?

        • How are google’s search results these days? About 10 years ago they were too often junk so I had to stop using it. Either you got an echo chamber of your previous searches or whatever the most popular (paid for?) links were.

          Say there was a famous band named “Bayes”, then all the top results would be about the band rather than the technical topic. At that point DDG became superior to startpage.

          But the last few years DDG results have been degrading for the relatively obscure topics I usually search for. Google, of course, is a company that profits off ignorance regarding the value of people’s personal data. But maybe startpage is worth a try again.

        • Anoneuoid:

          Google search results through startpage are not terrible. That’s as far as I’ll go. They’re way less good than they used to be it’s true.

          See Cory Doctorow’s enshittification thesis: https://www.wired.com/story/tiktok-platforms-cory-doctorow/

          or see any of his followup articles. It seems that he personally has been subject to this kind of thing, with for example his hundreds of thousands of Twitter followers almost never seeing his posts because Twitter prefers they see other stuff that makes Twitter more money (and this happened before Elon Musk came along). He’s since moved to Mastodon and loves it there because people who want to see his stuff get to see his stuff, no intermediary in the way.

          Google is the same thing, they want to steer you to places you’ll see google ads. The search engine is just a way for you to get steered to their ads, and they’ll maximize that ad revenue, not your satisfaction.

          startpage at least puts a barrier between you and them so they don’t know who it is they’re serving results to, and you’ll get more averaged results instead of stuff designed to make you personally into a money maker for them.

        • Daniel,

          I clicked through and read the Doctorow article. It’s compellingly written. I don’t know how much to believe of it, but a lot of it made sense to me.

          One thing that confused me, though: he writes:

          I have ~500k followers on Twitter and my threads used to routinely get hundreds of thousands or even millions of reads. Today, it’s hundreds, perhaps thousands.

          I thought the way twitter worked was that if you’re a follower of Doctorow, you’d see all of his posts by default. Isn’t that what “following” does? Or is it more complicated, that “following” just increases the probability you’ll see a thread?

        • To still continue this discussion a little: the EU’s Digital Services Act (DSA) aims to solve some of these issues by granting vetted researchers and others access to data and algorithms of large platforms. Besides the well-known commercial issues (such as the price discrimination, promotion inequalities, etc. raised by Doctorow), there are all kinds of interesting things for academic researchers, civil society groups, journalists, regulators, and others to examine. For instance, some initial probes done in Europe indicate that TikTok is pushing heavy Chinese propaganda to *some particular* users (even though, at the same time, its recommender system is perceived as excellent). Quite a few papers have also examined alleged or real radicalization pathways offered by YouTube’s recommender algorithm. And so forth.

          As for research, here is an idea on how the DSA might work in practice:

          https://algorithmwatch.org/en/dsa-data-access-explained/

          There has been a growing interest in the DSA also in some parts of the US, as expressed e.g. by the NYU school of engineering. Also political scientists throughout the world should have an interest in these things.

        • Andrew, yes that’s what everyone thinks, and probably in the early days of Twitter mostly that’s what it did, and then slowly Twitter optimizes their own interests and soon it doesn’t do that. No one actually knows what it does. I’ve still got a Facebook account but it has enshittified so much that my wife can post a picture of something we did and I don’t even get a notification, it might as well never have happened. I went away for a couple weeks and when I came back and checked my FB feed it was literally entirely one person’s posts being shown. And that wasn’t because other people didn’t do stuff.

          These platforms optimize their own concerns. And his thesis is they start out optimizing the user experience to attract users, then they optimize the advertisers concerns, then eventually they optimize their own concern. I think it’s accurate.

          Mastodon is not like that, it’s not run by a massive corporation optimizing it’s own interest, it’s run by ~10k community organizations, each one providing a separately run server all talking together. Will sdf.org a bunch of computer and Anime nerds start fiddling with my timeline of posts? Seems highly unlikely. So I’ll be getting posts of people I subscribe to, 100%.

          The open collaborative web, like this blog, is what concerns me. I don’t give two figs about Amazon or Facebook, or Twitter or OpenAI. They can get stuffed.

        • The DSA link is an interesting read. It looks a lot like a FOIA request in the US for government information. The extensive vetting process is a bit daunting. Since it is commercial data, I suppose care must be taken, but it seems like an awful lot of paperwork, time, and layers of bureaucracy in order to access data from these platforms. And, if you get the data to do some research, I imagine that nobody else will get to see the data you used (since it is proprietary and you’ve guaranteed to protect it). So, any research you do, and publications you have, will not provide data for others replicate or reproduce.

          But, it is better than nothing.

        • Daniel:

          Wait. I still don’t understand. I’m not on twitter myself so I’m sure that’s part of the confusion . . .

          Anyway, here’s my question. If I’m on twitter and I click on “follow” for Cory Doctorow, don’t I see all his posts on my feed? If not, what happens? I only see some of his posts? This is hard for me to believe. The whole point of following people is that you see their posts, no?

        • I’m technically signed up to Twitter but I don’t use it so I have to take Cory’s word for it. The answer is no according to him. He has 500k Twitter followers but for anything he posted the number of people seeing it dwindled to effectively nothing eventually, a few hundred maybe. Twitter has internal algorithms that decide what Twitter users see, having followed someone is only one input to that algorithm. The same is true for Facebook, there are a bunch of in real life important people who post stuff to Facebook that I don’t see, like my wife, or friends from as far back as 5th grade, or siblings, or whatever.

          So, no, you don’t just see “all the stuff You’re subscribed to”. You see what Twitter wants you to see. For some people they may not detect the difference but for others they may have a massively distorted feed.

          Fuck that, am I right?

          People talk about Mastodon as a Twitter replacement, and how frustrated they are with it, but it’s not a Twitter replacement, it’s a very different architecture and a very different point of view. I recommend it, you’d like it I think, it’s much more like your blog or the Stan discourse than it is Twitter.

        • https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys

          That’s the enshittification essay original link for anyone having the wired version ask for subscriptions

          Note that this is not just Twitter or just TikTok. It’s easily possible to search directly for the brand name of a product on Amazon and it’s a product that Amazon DOES sell, and yet Amazon doesn’t show it to you, it shows you some pages full of cheap Chinese knockoffs with brand names like GLABAROO or whatever.

          How long before OpenAI inserts stuff into code you ask it to write because the NSA asked it to? How long before it intentionally adds bugs, delays, and excess complexity to stuff that appears to compete with people who pay it money?

          Of course you’ll never know it’s doing that, it’ll just slowly get less useful, slowly you’ll find that what used to take you 20 minutes now takes you 90, and it will send you links to web pages to check out info and they will exist but it’s just blather that you have to read through to realize it’s not helpful, and there are ads of course, and ads that OpenAI gets a cut from showing you… Or whatever.

          No thanks. I’m done with that crap. I won’t be using big “platforms” and I encourage you all to be aware of this dynamic and join in the backlash. No more PNAS, no more Twitter, no more OpenAI. If it runs on your own computer, or a cloud computers but using software you install and you control what it does, then yes. Otherwise f*** that.

        • Let’s also not forget what a tremendous resource for spam ChatGPT is going to be

          https://social.sdf.org/@indrora/110233077775636843

          Shows just the tip of the iceberg. Soon every commerce site on the web could be 90% utterly manufactured out of thin air garbage(it’s already 20-40%). There will be an enormous arms race here. ChatGPT will write the descriptions of crap products, and competitors products will be filled with bad reviews written by ChatGPT and uploaded by Bangladeshi click farmers wielding scripts. Computer generated unboxing and review videos will masquerade as humans praising products that don’t work… The future is now, and if you think spam emails were a problem in 1999 that spam filters mostly handled you ain’t seen nothing yet. Just wait until the science literature is being auto-spammed with articles “replicating” the power pose literature from data manufactured out of whole cloth at the rate of 100k articles a day. Imagine how quickly you can apply for govt research grants if you’re the next Brian Wansink and have ChatGPT to assist?

  8. I’d also strongly suggest people experiment with it and begin to adopt it. It’s incredibly useful (and cheap), so much so that I don’t think there is any going back to the world without it. I’ve found it useful for debugging Python scripts, for speeding up visualization creation using ggplot/matplotlib, and have even toyed around with having it set up different Regression models. Of course you should not feed it any sensitive information/trade secrets, and it’s not 100% reliable, but still I’ve been blown away by it’s capabilities and I too always keep a window open while working. I think even if you go into it with high expectations of its capabilities, it’ll likely surpass them and you’ll be amazed. Also keep in mind that most people using it are beginners at formulating prompts, as your experience increases its value will too.

    For those of you who teach I think it would also be very useful for students who are learning to use R/Python. Besides just being able to solve/debug problems, it does a good job explaining why errors occur. I can still remember many of my classmates being discouraged just because they were having issues installing LaTeX/R Studio/Sweave, a tool like this would’ve helped them greatly. There was a post on this blog recently about the difficulty of teaching/learning both programming and statistics at the same time, ChatGPT can really ease the burden on students/professors. Though of course, the potential for misuse (i.e. cheating) is large and obvious.

  9. Working with ChatGPT4 for this task was like working with a good intern programmer who is rather forgetful and a bit scattered. At one point it gave me a line like target -= normal_lcdf(…). I said “target -= looks weird, is that really acceptable syntax?” and it said Yes! (It isn’t).

    This error might be the sparsity. I’ve noticed a category of stubborn, persistent bugs in GPT-3/4, particularly obvious in coding problems, which typically comes down to a single character/word, which aren’t fixed even when explicitly highlighted. Qualitatively, they seem to match what one might expect from internal sparsity leading to irreversible information loss within the model layers: https://reddit.com/r/slatestarcodex/comments/1201v68/10word_quote_a_short_and_simple_failure_mode_of/jdjsx43/

    Still, I’m overall impressed. Coding in Stan is little like coding in a regular programming language, and there is not much Stan code online to begin with (even assuming they scraped it all rather than missed much of it for being in books or papers).

  10. It seems to depend massively on the programming language and packages you use. The more widely they are used the better it is (reflecting the amount of training data) and probably also the more common/standard your problem is. It’s great for getting common tidyverse things. Then I tried modeling with brms and it looked plausible, the models often fit, but were usually pretty wrong in obvious (and occasionally less obvious) ways like keeping a normal family when I asked for “logistic regression”. Finally, I tried to solve a novel problem (I’m pretty sure this hasn’t been done, yet, because I googled extensively before doing it myself) using the torch R package. That was a complete disaster. I kept slipping into Python and kept using things that exist in PyTorch but not the R package. I’d guess torch has pretty low usage as a R package (relatively speaking) and this shows.

  11. Disclaimer: although my engineering department considered using neural networks at part of our design automation and I did some study for that in the 1990’s, I am not an expert. There has been an ongoing discussion of ChatGPT at Scott Aaronson’s blog for several months now which I have followed, but I’m still not expert. I have however seen many examples of useful and/or impressive work by ChatGPT and especially GPT-4 (such as getting A’s in a couple of graduate-course final exams) (Economics and Organic Chemistry–I would have gotten F’s). My first impression was that it was an AUCS (Artificial Used Car Salesman), but I have evolved from that to: ChatGPT is a story-teller. Many of its prompts ask it to make up a story or poem, and from its point of view, every request is a request for a story. Which it makes up based on what it processed from the Internet, during its training.

    Based on that hypothesis, it does well (better than the average human anyway) at requests for code routines, because there isn’t a lot of mis-information about coding on the Internet, with which to construct coding stories. (Although the “-=” syntax may be one.)

    My suggestion on the Shtetl-Optimized threads has been to restrict the information it trains on to accredited sources, such as online text books and encyclopedia. Those sources may not be rich enough to generate its remarkable language ability though. Also to add a checking pass, based on anecdotal data that follow-up prompts asking it to critique the answer it just gave sometimes produce corrections. (That last sentence may be complicated to parse; ChatGPT can parse it though.) It also needs access to separate routines for calculations and testing code. (There is a way to connect it to Wolfram Alpha.)

    Its future development and use or misuse is in our (humanity’s) hands. As usual, if good people are not active in the process, evil ones will prevail. I think it would be a tremendous accomplishment to (someday) make something which thinks better than we do.

  12. When deciding whether to use something like chatGPT, I’ve been consider the social costs of training the model…

    E.g. criticisms like: https://time.com/6247678/openai-chatgpt-kenya-workers/

    I realize for most of the uses I would put it to, a training set like stack overflow would do fine, and probably not require the kind of ethically challenging activities referenced there, so maybe I could feel good about using something that was more specialized and had some guarantee that no humans were harmed in the making of it.

    But there is maddeningly little info provided on how these things are trained, making it easy to ignore any social costs.

    I also wonder about destroying the ecosystems that produce the intelligence being parroted back. What happens if we destroy the viability of ecosystems like stack overflow? How long does this remain good? I think finding a way to make sure the training material is correctly cited and compensated is actually important.

    So, my concerns are very separate from the ones raised, which are more along “does it work?” and “will it leak my private info?”, but I’m still hesitant to adopt tgis sort of approach even if it shows short term effectiveness because of ethical concerns on how it was trained.

  13. Interesting results,

    My two takeaways are

    – That training material might or might not be relevant for answer quality. A good example was when playing with ChatGpt 4; I tricked it into `writing` a brainfuck interpreter to yield results from brainfuck inputs. So if you ask if the output compiles or what’s the result, it will attempt to `execute` it.

    – Using these large language models for programming yields the same results as upgrading from a handsaw to a circular saw when doing woodworking. You’ll likely get refined results by being an artisan, but most people won’t care about how you did it.

Leave a Reply

Your email address will not be published. Required fields are marked *