scala – Replace Special characters of column names in Spark dataframe

scala – Replace Special characters of column names in Spark dataframe

df
  .columns
  .foldLeft(df){(newdf, colname) =>
    newdf.withColumnRenamed(colname, colname.replace( , _).replace(., _))
  }
  .show

You can use withColumnRenamed regex replaceAllIn and foldLeft as below

val columns = df.columns

val regex = [+._, ]+
val replacingColumns = columns.map(regex.r.replaceAllIn(_, _))

val resultDF = replacingColumns.zip(columns).foldLeft(df){(tempdf, name) => tempdf.withColumnRenamed(name._2, name._1)}

resultDF.show(false)

which should give you

+---------------+---------------+-----------------------+
|Main_CustomerID|126_Concentrate|2_5_Ethylhexyl_Acrylate|
+---------------+---------------+-----------------------+
|725153         |3.0            |2.0                    |
|873008         |4.0            |1.0                    |
|625109         |1.0            |0.0                    |
+---------------+---------------+-----------------------+

I hope the answer is helpful

scala – Replace Special characters of column names in Spark dataframe

In java you can iterate over column names using df.columns() and replace each header string with string replaceAll(regexPattern, IntendedCharreplacement)

Then use withColumnRenamed(headerName, correctedHeaderName) to rename df header.

eg –

for (String headerName : dataset.columns()) {
    String correctedHeaderName = headerName.replaceAll( ,_).replaceAll(+,_);
    dataset = dataset.withColumnRenamed(headerName, correctedHeaderName);
}
dataset.show();

xml – Spark Dataframe with CLIXML 

Leave a Reply

Your email address will not be published. Required fields are marked *